User Tools

Site Tools


visual3d:tutorials:knowledge_discovery:looking_at_large_public_data_sets

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
visual3d:tutorials:knowledge_discovery:looking_at_large_public_data_sets [2024/06/14 17:32] – created sgrangervisual3d:tutorials:knowledge_discovery:looking_at_large_public_data_sets [2024/07/17 15:46] (current) – created sgranger
Line 1: Line 1:
-|**Language:**|** English**  • [[index.php?title=Visual3D_Tutorial:_Looking_at_Large_Public_Datasets/fr&action=edit&redlink=1|français]] • [[index.php?title=Visual3D_Tutorial:_Looking_at_Large_Public_Datasets/it&action=edit&redlink=1|italiano]] • [[index.php?title=Visual3D_Tutorial:_Looking_at_Large_Public_Datasets/pt&action=edit&redlink=1|português]] • [[index.php?title=Visual3D_Tutorial:_Looking_at_Large_Public_Datasets/es&action=edit&redlink=1|español]] ****| +====== Looking at Large Public Data Sets  ======
- +
-|===== Contents =====\\ \\ \\ \\ * [[#Large_Datasets|1 Large Datasets]]\\ * [[#Dataset|2 Dataset]]\\ * [[#File_Naming_Conventions|3 File Naming Conventions]]\\ * [[#Building_Models|4 Building Models]]\\ * [[#Inconsistent_Subject_Prefixes_(Stroke_Survivors)|5 Inconsistent Subject Prefixes (Stroke Survivors)]]\\ * [[#Inconsistent_Subject_Prefixes_(Able_Bodied)|6 Inconsistent Subject Prefixes (Able Bodied)]]\\ * [[#Processing_C3Ds_into_CMZs_(Stroke_Survivors)|7 Processing C3Ds into CMZs (Stroke Survivors)]]\\ * [[#Processing_C3Ds_into_CMZs_(Able-Bodied_Participants)|8 Processing C3Ds into CMZs (Able-Bodied Participants)]]\\ * [[#Missing_Parameters|9 Missing Parameters]]\\ * [[#Conclusion|10 Conclusion]]\\ * [[#References|11 References]]|+
  
 Biomechanics is coming into a new age, where large datasets are the norm and individually processing the files can be tiresome and time consuming. Captured over long periods of time, datasets can be inconsistent, which makes it difficult to automate working with them. Biomechanics is coming into a new age, where large datasets are the norm and individually processing the files can be tiresome and time consuming. Captured over long periods of time, datasets can be inconsistent, which makes it difficult to automate working with them.
Line 10: Line 8:
  
  
-===== Large Datasets =====+==== Large Datasets ====
  
 The dataset used for this tutorial is the raw motion data collected by (Van Criekinge et al [[https://www.nature.com/articles/s41597-023-02767-y|[1]]]), which includes the full-body motion capture gait of 138 able-bodied adult participants and 50 stroke survivors. They include Full Body Kinematics (PiG Model), Kinetics, and EMG Data across a wide variety of ages, heights and weights. This is an open sourced dataset, which allows for free and open science, supports the growing bio-mechanical research community and increases the ability for scientists to duplicate results of tests. The dataset used for this tutorial is the raw motion data collected by (Van Criekinge et al [[https://www.nature.com/articles/s41597-023-02767-y|[1]]]), which includes the full-body motion capture gait of 138 able-bodied adult participants and 50 stroke survivors. They include Full Body Kinematics (PiG Model), Kinetics, and EMG Data across a wide variety of ages, heights and weights. This is an open sourced dataset, which allows for free and open science, supports the growing bio-mechanical research community and increases the ability for scientists to duplicate results of tests.
Line 16: Line 14:
 This dataset is unique in its size and quality for an open source dataset. Large datasets are essential for scientific observations to be able to accurately draw conclusions, and are becoming the norm within the bio-mechanical community. Because of their nature as large collections of data, there are often unique problems relating to working with such datasets, which is to be explored here. Datasets such as these can be generated over a long period of time, where people, standards or equipment may change. Large datasets can also be the culmination of several different projects, which further leads to inconsistencies. The most frequent issues we found were due to inconsistencies within the measured dataset: it can be difficult to keep a rigid structure to any large dataset, but inconsistencies can wreck the ability to automate tasks, so we’ll show how you can effectively process the data in large datasets like this. This dataset is unique in its size and quality for an open source dataset. Large datasets are essential for scientific observations to be able to accurately draw conclusions, and are becoming the norm within the bio-mechanical community. Because of their nature as large collections of data, there are often unique problems relating to working with such datasets, which is to be explored here. Datasets such as these can be generated over a long period of time, where people, standards or equipment may change. Large datasets can also be the culmination of several different projects, which further leads to inconsistencies. The most frequent issues we found were due to inconsistencies within the measured dataset: it can be difficult to keep a rigid structure to any large dataset, but inconsistencies can wreck the ability to automate tasks, so we’ll show how you can effectively process the data in large datasets like this.
  
-===== Dataset =====+==== Dataset ====
  
 The dataset used for able-bodied participants is available [[https://springernature.figshare.com/articles/dataset/c3d_files_of_3D_full-body_gait_kinematics_kinetics_and_EMG_of_138_able-bodied_adults/24192480?backTo=/collections/A_full-body_motion_capture_gait_dataset_of_138_able-bodied_adults_across_the_life_span_and_50_stroke_survivors/6503791|here]], while the dataset used for stroke survivors is available [[https://springernature.figshare.com/articles/dataset/c3d_files_of_3D_full-body_gait_kinematics_kinetics_and_EMG_of_50_adults_with_stroke/24192483?backTo=/collections/A_full-body_motion_capture_gait_dataset_of_138_able-bodied_adults_across_the_life_span_and_50_stroke_survivors/65037911|here]] (also contains **SubjectChar.xlsx**, an excel file with survivors parameters). The dataset used for able-bodied participants is available [[https://springernature.figshare.com/articles/dataset/c3d_files_of_3D_full-body_gait_kinematics_kinetics_and_EMG_of_138_able-bodied_adults/24192480?backTo=/collections/A_full-body_motion_capture_gait_dataset_of_138_able-bodied_adults_across_the_life_span_and_50_stroke_survivors/6503791|here]], while the dataset used for stroke survivors is available [[https://springernature.figshare.com/articles/dataset/c3d_files_of_3D_full-body_gait_kinematics_kinetics_and_EMG_of_50_adults_with_stroke/24192483?backTo=/collections/A_full-body_motion_capture_gait_dataset_of_138_able-bodied_adults_across_the_life_span_and_50_stroke_survivors/65037911|here]] (also contains **SubjectChar.xlsx**, an excel file with survivors parameters).
Line 25: Line 23:
  
  
-===== File Naming Conventions =====+==== File Naming Conventions ====
  
 A common inconsistency in large datasets can be the usage of filenames. Filenames should convey information about a file in a manner that can be easily referenced, eg: calibration files should be named differently from motion files so that they can be separated, and motion files should have a trial number in their file name (if applicable), etc. A common inconsistency in large datasets can be the usage of filenames. Filenames should convey information about a file in a manner that can be easily referenced, eg: calibration files should be named differently from motion files so that they can be separated, and motion files should have a trial number in their file name (if applicable), etc.
Line 51: Line 49:
  
  
-===== Building Models =====+==== Building Models ====
  
 The data set used the Plug in Gait marker set, unfortunately not all that standard Plug in Gait markers were used for example the right and left upper arm markers were missing, so only a model of the lower body can be made, see the tutorial [[https://www.c-motion.com/v3dwiki/index.php?title=Tutorial:_Plug-In_Gait_Lower-Limb|here]] to walk through the process of making a lower body plug in gait model. The data set used the Plug in Gait marker set, unfortunately not all that standard Plug in Gait markers were used for example the right and left upper arm markers were missing, so only a model of the lower body can be made, see the tutorial [[https://www.c-motion.com/v3dwiki/index.php?title=Tutorial:_Plug-In_Gait_Lower-Limb|here]] to walk through the process of making a lower body plug in gait model.
Line 62: Line 60:
  
  
-===== Inconsistent Subject Prefixes (Stroke Survivors) =====+==== Inconsistent Subject Prefixes (Stroke Survivors) ====
  
 Most of the markers within the static files provided are prefixed with the subject ID, however the dynamic file markers are not, this will prevent the built model from linking with the dynamic markers. To begin, we will ensure that all static files include a subject-prefix, and then we will prepend these to the markers within the dynamic files.\\ Most of the markers within the static files provided are prefixed with the subject ID, however the dynamic file markers are not, this will prevent the built model from linking with the dynamic markers. To begin, we will ensure that all static files include a subject-prefix, and then we will prepend these to the markers within the dynamic files.\\
 **Static file marker:**\\ **Static file marker:**\\
-[[File:stroke_model_prefix.png|{{/images/thumb/2/2c/stroke_model_prefix.png/300px-stroke_model_prefix.png?300x44}}]]\\+{{:stroke_model_prefix.png}}\\
 **Dynamic file marker:**\\ **Dynamic file marker:**\\
-[[File:stroke_dynamic_no_prefix.png|{{/images/3/35/stroke_dynamic_no_prefix.png?300x45}}]]+{{:stroke_dynamic_no_prefix.png}}
  
 Stroke Survivors 47 and 58 (TVC47 and TVC58) are both missing subject prefixes. To rectify, run the pipeline file **add_calibration_prefixes.v3s** once for each of these stroke survivors, changing the "/FOLDER" section of the pipeline parameter "Set_Pipeline_Parameter_To_List_Of_Files" to the folder containing each stroke survivors files (TVC47 and TVC58). Stroke Survivors 47 and 58 (TVC47 and TVC58) are both missing subject prefixes. To rectify, run the pipeline file **add_calibration_prefixes.v3s** once for each of these stroke survivors, changing the "/FOLDER" section of the pipeline parameter "Set_Pipeline_Parameter_To_List_Of_Files" to the folder containing each stroke survivors files (TVC47 and TVC58).
Line 232: Line 230:
 ''%%; %%'' ''%%; %%''
  
-===== Inconsistent Subject Prefixes (Able Bodied) =====+==== Inconsistent Subject Prefixes (Able Bodied) ====
  
 The markers in the static files of the able bodied participants do not have subject prefixes, however a random selection of the dynamic file markers do have prefixes. In this case it makes sense to strip the subject prefixes from the select files that have them instead of adding them to every other file. This can be done using the "remove_subject_prefixes_healthy.v3s" pipeline file. The markers in the static files of the able bodied participants do not have subject prefixes, however a random selection of the dynamic file markers do have prefixes. In this case it makes sense to strip the subject prefixes from the select files that have them instead of adding them to every other file. This can be done using the "remove_subject_prefixes_healthy.v3s" pipeline file.
Line 262: Line 260:
  
  
-===== Processing C3Ds into CMZs (Stroke Survivors) =====+==== Processing C3Ds into CMZs (Stroke Survivors) ====
  
 Instead of manually processing each C3D file, we can use another pipeline to automate the process: Instead of manually processing each C3D file, we can use another pipeline to automate the process:
Line 456: Line 454:
 ''%%; %%'' ''%%; %%''
  
-===== Processing C3Ds into CMZs (Able-Bodied Participants) =====+==== Processing C3Ds into CMZs (Able-Bodied Participants) ====
  
 Since the naming convention for the able-bodied participants is different from the stroke survivors the "process_c3ds_healthy.v3s" pipeline can be used instead of the pipeline used for the stroke survivors. The model information needed is stored in the calibration file and not the dynamic file, finally the subject IDs are not consistent throughout all the files so a new naming convention must be made for the CMZ files.\\ Since the naming convention for the able-bodied participants is different from the stroke survivors the "process_c3ds_healthy.v3s" pipeline can be used instead of the pipeline used for the stroke survivors. The model information needed is stored in the calibration file and not the dynamic file, finally the subject IDs are not consistent throughout all the files so a new naming convention must be made for the CMZ files.\\
Line 706: Line 704:
 ''%%; %%'' ''%%; %%''
  
-===== Missing Parameters =====+==== Missing Parameters ====
  
 Upon inspection some of the dynamic files were missing or have incorrectly named parameters needed to calculate the model (height, mass, ankle widths, and knee widths) in the case of the stroke participants this data was provided in a spreadsheet so the values can be entered manually and the models recalculated. Upon inspection some of the dynamic files were missing or have incorrectly named parameters needed to calculate the model (height, mass, ankle widths, and knee widths) in the case of the stroke participants this data was provided in a spreadsheet so the values can be entered manually and the models recalculated.
  
-[[File:missing.jpg|{{/images/c/ca/missing.jpg?402x492}}]]+{{:missing.jpg}}
  
 Open the CMZ one of Stroke Survivors 24, 34, or 57 (TVC_24.cmz, TVC_34.cmz, and TVC_57.cmz) . Edit the "Subject Data/Metrics" **Height, Mass, Left_Knee_Width, Right_Knee_Width, Left_Ankle_Width, and Right_Ankle_Width** to match those in the file **SubjectChar.xlsx**. Open the CMZ one of Stroke Survivors 24, 34, or 57 (TVC_24.cmz, TVC_34.cmz, and TVC_57.cmz) . Edit the "Subject Data/Metrics" **Height, Mass, Left_Knee_Width, Right_Knee_Width, Left_Ankle_Width, and Right_Ankle_Width** to match those in the file **SubjectChar.xlsx**.
Line 721: Line 719:
 After completing this, you will have a completely processed dataset! You can follow other [[https://www.c-motion.com/v3dwiki/index.php?title=Category:Tutorials|tutorials]] to see what analysis you might do with this data afterwards. After completing this, you will have a completely processed dataset! You can follow other [[https://www.c-motion.com/v3dwiki/index.php?title=Category:Tutorials|tutorials]] to see what analysis you might do with this data afterwards.
  
-===== Conclusion =====+==== Conclusion ====
  
 Through this tutorial we have identified the common pitfalls that may occur when processing a large biomechanics dataset. Using the public dataset from Van Criekinge et al. as an example, we have identified the challenges encountered and presented realistic solutions. Just like the underlying data set, we made all of our files openly available. You are encouraged to work with this dataset as we did and to build upon our work by analyzing the post-processed data. Through this tutorial we have identified the common pitfalls that may occur when processing a large biomechanics dataset. Using the public dataset from Van Criekinge et al. as an example, we have identified the challenges encountered and presented realistic solutions. Just like the underlying data set, we made all of our files openly available. You are encouraged to work with this dataset as we did and to build upon our work by analyzing the post-processed data.
  
-===== References =====+==== References ====
  
 **Paper:** Van Criekinge et al. a full-body motion capture gait dataset of 138 able-bodied adults across the life span and 50 stroke survivors: [[https://www.nature.com/articles/s41597-023-02767-y|[2]]] **Paper:** Van Criekinge et al. a full-body motion capture gait dataset of 138 able-bodied adults across the life span and 50 stroke survivors: [[https://www.nature.com/articles/s41597-023-02767-y|[2]]]
  
- 
-Retrieved from "" 
  
  
visual3d/tutorials/knowledge_discovery/looking_at_large_public_data_sets.1718386330.txt.gz · Last modified: 2024/06/14 17:32 by sgranger