visual3d:tutorials:knowledge_discovery:looking_at_large_public_data_sets
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
visual3d:tutorials:knowledge_discovery:looking_at_large_public_data_sets [2024/07/17 15:42] – removed sgranger | visual3d:tutorials:knowledge_discovery:looking_at_large_public_data_sets [2024/07/17 15:46] (current) – created sgranger | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Looking at Large Public Data Sets ====== | ||
+ | |||
+ | Biomechanics is coming into a new age, where large datasets are the norm and individually processing the files can be tiresome and time consuming. Captured over long periods of time, datasets can be inconsistent, | ||
+ | |||
+ | In this tutorial, we will be going through common issues seen while working with large public biomechanics datasets, through the use of a large dataset released which tracks the full body gait of both stroke survivors and able-bodied participants. | ||
+ | |||
+ | Sections highlighted like this show when actions should be performed, if you are following along with the tutorial. | ||
+ | |||
+ | |||
+ | ==== Large Datasets ==== | ||
+ | |||
+ | The dataset used for this tutorial is the raw motion data collected by (Van Criekinge et al [[https:// | ||
+ | |||
+ | This dataset is unique in its size and quality for an open source dataset. Large datasets are essential for scientific observations to be able to accurately draw conclusions, | ||
+ | |||
+ | ==== Dataset ==== | ||
+ | |||
+ | The dataset used for able-bodied participants is available [[https:// | ||
+ | |||
+ | The authors have made an already-processed dataset available for use, but for the purposes of this tutorial, we shall be using the original, raw data. | ||
+ | |||
+ | Please find all of the needed pipeline and model files [[https:// | ||
+ | |||
+ | |||
+ | ==== File Naming Conventions ==== | ||
+ | |||
+ | A common inconsistency in large datasets can be the usage of filenames. Filenames should convey information about a file in a manner that can be easily referenced, eg: calibration files should be named differently from motion files so that they can be separated, and motion files should have a trial number in their file name (if applicable), | ||
+ | |||
+ | In this dataset, we found inconsistencies within filenames, as well as an error relating to the file name. Particularly within the stroke survivors, there was no consistent naming convention across all of the motion files, other than all including the letters “BWA”. How the trialnumber was conveyed was done in a variety of different methods, including separating with spaces, underscores or nothing, as well as prepending a “0” to the trial number. Ultimately this forces the user to use wildcards to collect all the motion files at once, and limits their ability to individually select motion files. | ||
+ | |||
+ | There are several files which need to be altered or deleted: | ||
+ | |||
+ | \\ | ||
+ | |||
+ | |||
+ | Stroke Survivor 47 (TVC47) has a mislabeled calibration file. This must be renamed from “BWA (0).c3d” to something following their calibration conventions, | ||
+ | |||
+ | |||
+ | \\ | ||
+ | |||
+ | |||
+ | Stroke Survivor 38 (TVC38) does not have a valid calibration file. The corresponding folder should be deleted (TVC38). | ||
+ | |||
+ | |||
+ | \\ | ||
+ | |||
+ | |||
+ | Able-Bodied Participants 37 (SUBJ37), 42 (SUBJ42), and 118-138 (SUBJ118-SUBJ138) follow a different model to mark the pelvis, or are missing necessary data. The corresponding folders should be deleted (SUBJ38, SUBJ42, SUBJ118-SUBJ138) | ||
+ | |||
+ | |||
+ | ==== Building Models ==== | ||
+ | |||
+ | The data set used the Plug in Gait marker set, unfortunately not all that standard Plug in Gait markers were used for example the right and left upper arm markers were missing, so only a model of the lower body can be made, see the tutorial [[https:// | ||
+ | |||
+ | The subset of able-bodied participants used in this tutorial has three markers for the pelvis as opposed to the four used for the stroke survivors; two separate models need to be made. The default values provided in the tutorial can be used to build the models, as the subject data will be entered in via the processing pipeline. | ||
+ | |||
+ | For the Stroke Survivors, it is important to build the original model without any subject-prefixes (managed later in this tutorial). Participant 58 should be used for this, as their calibration file does not have any attached subject-prefixes in the raw data. | ||
+ | |||
+ | The respective models we used have been provided in this tutorials files: **stroke_model.mdh** and **healthy_model.mdh**. | ||
+ | |||
+ | |||
+ | ==== Inconsistent Subject Prefixes (Stroke Survivors) ==== | ||
+ | |||
+ | Most of the markers within the static files provided are prefixed with the subject ID, however the dynamic file markers are not, this will prevent the built model from linking with the dynamic markers. To begin, we will ensure that all static files include a subject-prefix, | ||
+ | **Static file marker:**\\ | ||
+ | {{: | ||
+ | **Dynamic file marker:**\\ | ||
+ | {{: | ||
+ | |||
+ | Stroke Survivors 47 and 58 (TVC47 and TVC58) are both missing subject prefixes. To rectify, run the pipeline file **add_calibration_prefixes.v3s** once for each of these stroke survivors, changing the "/ | ||
+ | |||
+ | |||
+ | The pipeline " | ||
+ | |||
+ | Run the pipeline **convert_subject_prefix_stroke.v3s**, | ||
+ | |||
+ | |||
+ | Below is an explanation of the important sections of the file " | ||
+ | |||
+ | **Set a pipeline parameter to a list of all dynamic files contained in the " | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | \\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Loop through all dynamic files:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Open the current file:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Get the Participant ID from PARAMETERS: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Get the preceding half of the Participant ID using the STRING_LEFT expression, since all IDs begin with BWA we can hard code the index value:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Get the length of the participant id using the STRING_LENGTH expression: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Get the index of the dash using the STRING_FIND expression**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | \\ | ||
+ | **Subtract the DASH_INDEX from SUBJECT_ID_LENGTH to get the negative index of the dash, subtract that value by one to get the value needed for STRING_RIGHT (the negative index of the character immediately following the dash):**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Get the characters in the subject id following the dash using STRING_RIGHT: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Concatenate LEFT_SUBJECT_ID and RIGHT_SUBJECT_ID with an underscore between them:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Get a list of all target names**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Using the updated subject ID and the list of target names, use the modify participants parameters pipeling function the add the neeeded subject prefixes, ensuring OVERWRITE_C3D_FILE is set to true so the changes are preserved: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Clear the workspace to preserve memory, and then close the loop:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | \\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | ==== Inconsistent Subject Prefixes (Able Bodied) ==== | ||
+ | |||
+ | The markers in the static files of the able bodied participants do not have subject prefixes, however a random selection of the dynamic file markers do have prefixes. In this case it makes sense to strip the subject prefixes from the select files that have them instead of adding them to every other file. This can be done using the " | ||
+ | |||
+ | Run the pipeline **remove_subject_prefixes_healthy.v3s**, | ||
+ | |||
+ | |||
+ | Because the prefixes are being removed instead of added, the pipeline is much simpler. Below is the primary change to the pipeline:\\ | ||
+ | \\ | ||
+ | **Using the modify C3D subjects parameters pipeline function clear the participant prefixes from the files, set OVERWRITE_C3D_FILE to true to ensure the changes are preserved, keep all other parameters at their default value:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | ==== Processing C3Ds into CMZs (Stroke Survivors) ==== | ||
+ | |||
+ | Instead of manually processing each C3D file, we can use another pipeline to automate the process: | ||
+ | |||
+ | Because the parameters needed to accurately calculate the models are stored within the dynamic files, they must be opened first and the data saved to pipeline parameters so it can then be applied to the model. | ||
+ | |||
+ | Edit the pipeline command " | ||
+ | |||
+ | Run the pipeline **process_c3ds_stroke.v3s**, | ||
+ | |||
+ | |||
+ | It should also be mentioned that this dataset does not have reliable enough force plate data to accurately create gait events through kinetic events, and as such a kinematic model is used to generate these. This is done using the Zeni Method 1a (foot position relative to pelvis) from this [[Visual3D: | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Set a pipeline parameter to the current folder path**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Get a list of all folders within the 50_StrokePiG folder:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Loop through the folders: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Calibration files are prefixed with TVC, set a pipeline parameter to all C3D files that start with TVC (there should only be one in each folder): | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Start a new workspace: | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Open all dynamic files within the current folder:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Set all dynamic files to active:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Save all the necessary metrics from PARAMETERS: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | //'' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Visual 3Ds expected unit of measurement is meters so the knee widths, ankle widths and height need to be converted from mm to m:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | //'' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Get the subject ID from PARAMETERS: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Create a hybrid model using the calibration file:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Apply the model template: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Set all the necessary model metrics and recalculate the model:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | //'' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Assign the model to the dynamic files:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Set all files to active:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Filter and Interpolate available data**\\ | ||
+ | '' | ||
+ | |||
+ | \\ | ||
+ | **Calculate Gait Events through the use of Kinematic signals**\\ | ||
+ | '' | ||
+ | |||
+ | \\ | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Run all desired computations and then save the workspace to a CMZ using the subject ID as the file name and end the loop**\\ | ||
+ | '' | ||
+ | \\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | \\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | ==== Processing C3Ds into CMZs (Able-Bodied Participants) ==== | ||
+ | |||
+ | Since the naming convention for the able-bodied participants is different from the stroke survivors the " | ||
+ | |||
+ | |||
+ | Edit the pipeline command " | ||
+ | |||
+ | Run the pipeline **process_c3ds_healthy.v3s**, | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Set a pipeline parameter to 0, this will be used for the naming of the CMZ file. The value will be iterated within the loop:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Set a pipeline parameter to a list of folders containing C3Ds within the 138_Healthy_PiG_10.05 folder:**\\ | ||
+ | \\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | \\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Loop through the folders: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Calibrations files are numbered (0) so get a list of all files ending with 0) (There should only be one in each folder): | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Open a new workspace: | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Open the calibration file:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Set the calibration file to active:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Save all the necessary information stored in PARAMETERS: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | //'' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Visual 3Ds expected unit of measurement is meters so the knee widths, ankle widths and height need to be converted from mm to m:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | //'' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Close the calibration file:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **To open the dynamic files without re-opening the calibration file, list all C3D files within the current folder:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Loop through each file:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Use a conditional statement to check that the current file name does not match the calibrations file name:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **if the file names do not match open the selected file:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Close the conditional statement and the loop**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | \\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Set all dynamic files to active:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Create a hybrid model using the calibration file:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Apply the model template: | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Set the necessary model metrics using the parameters saved from the calibration file and then recalculate the model:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | //'' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Assign model to the dynamic files:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Run all desired computations, | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | |||
+ | \\ | ||
+ | **Save the workspace as a CMZ appending the SUB_NUM variable to the name, then close the loop:**\\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | \\ | ||
+ | '' | ||
+ | '' | ||
+ | '' | ||
+ | |||
+ | ==== Missing Parameters ==== | ||
+ | |||
+ | Upon inspection some of the dynamic files were missing or have incorrectly named parameters needed to calculate the model (height, mass, ankle widths, and knee widths) in the case of the stroke participants this data was provided in a spreadsheet so the values can be entered manually and the models recalculated. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | Open the CMZ one of Stroke Survivors 24, 34, or 57 (TVC_24.cmz, | ||
+ | |||
+ | Run the pipeline **pipeline_manual.v3s** while the CMZ is open to recalculate the desired metrics. | ||
+ | |||
+ | Do this separately for all 3 stroke survivors. | ||
+ | |||
+ | |||
+ | After completing this, you will have a completely processed dataset! You can follow other [[https:// | ||
+ | |||
+ | ==== Conclusion ==== | ||
+ | |||
+ | Through this tutorial we have identified the common pitfalls that may occur when processing a large biomechanics dataset. Using the public dataset from Van Criekinge et al. as an example, we have identified the challenges encountered and presented realistic solutions. Just like the underlying data set, we made all of our files openly available. You are encouraged to work with this dataset as we did and to build upon our work by analyzing the post-processed data. | ||
+ | |||
+ | ==== References ==== | ||
+ | |||
+ | **Paper:** Van Criekinge et al. a full-body motion capture gait dataset of 138 able-bodied adults across the life span and 50 stroke survivors: [[https:// | ||
+ | |||
+ | |||
visual3d/tutorials/knowledge_discovery/looking_at_large_public_data_sets.1721230970.txt.gz · Last modified: 2024/07/17 15:42 by sgranger