sift:tutorials:clean_your_data
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
sift:tutorials:clean_your_data [2024/07/12 13:27] – removed sgranger | sift:tutorials:clean_your_data [2024/11/28 19:13] (current) – [Clean your Data] wikisysop | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Clean your Data ====== | ||
+ | |||
+ | This tutorial will show you how to use Sift as a data cleaning, or quality assurance, tool. You will learn how to check for and correct faulty force assignments from all files at once. This is particularly useful for lab managers (or supervisors) who may not be familiar with the raw data, or collaborators that were not involved in the collection process. | ||
+ | |||
+ | If you prefer, a video tutorial is available outlining the same process. It is available at this link: [[https:// | ||
+ | ==== Data ==== | ||
+ | |||
+ | This tutorial uses overground walking data from four subjects. The subjects walked at three different speeds; slow, normal and fast. The data was analyzed using a pipeline which included an automatic gait event detection command. [[Visual3D: | ||
+ | |||
+ | These data files can be downloaded using this link: [[https:// | ||
+ | |||
+ | ==== Loading the library ==== | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | 1. Click {{: | ||
+ | |||
+ | 2. Click {{: | ||
+ | |||
+ | 3. Click {{: | ||
+ | |||
+ | This step selects the path to the data you are using. If you intend to modify the data, e.g., by correcting invalid assignments, | ||
+ | |||
+ | NOTE: After the library path is set and loaded, files from the selected folder and sub-folders will be loaded and parsed. Sift will show a summary of the loaded files and what they contain in the [[Sift: | ||
+ | |||
+ | |||
+ | ==== Defining queries and calculating groups ==== | ||
+ | |||
+ | 1. Navigate to the [[Sift: | ||
+ | |||
+ | 2. To create a new query definition, click the {{: | ||
+ | |||
+ | 2.1. Type GRF in the **Query Name** text box in top-right and click **Save**. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | 3. While the GRF query is selected, click the {{: | ||
+ | |||
+ | 3.1. Type R_GRF in the **Condition Name** text box in the top-right. | ||
+ | |||
+ | 3.2. There are now three tabs that need to be completed in order to define the sub-group. | ||
+ | |||
+ | 3.2.1. **Signals**: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | 3.2.2. **Events**: This tab allows the user to specify the desired event sequence to extract data from. For instance, the right gait cycle could be extracted using the event sequences RON, ROFF or RHS, RHS. For this tutorial select the RON and ROFF events and leave the default normalization values of 101 for the number of points and cubic for the spline type. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | 3.2.3. **Refinements**: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | 3.3. Click {{: | ||
+ | |||
+ | 4. For this tutorial, define a second condition within the same group to account for the left side. This can be done by modifying the existing sub-group: | ||
+ | |||
+ | 4.1. While the GRF group is selected, click the {{: | ||
+ | |||
+ | 4.1.1. In the Signals tab, change the NAME to L_GRF and leave the other parameters as they are. | ||
+ | |||
+ | 4.1.2. In the Events tab, select the event sequence LON and LOFF. | ||
+ | |||
+ | 4.1.3. Leave the Refinements tab as it was. | ||
+ | |||
+ | 4.2. Click {{: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | You should now see one query (GRF) in the **Queries** list and two conditions (R_GRF and L_GRF) in the **Conditions** list. At this point in the tutorial these definitions have been created but they have not been applied to the signals in the loaded library. To do this, click on **Calculate All Queries** (or **Calculate Selected Queries**, since there is only one group in this tutorial). | ||
+ | |||
+ | Once the group is calculated you can close the Query Builder dialog, it is now time to examine the results visually by plotting them. | ||
+ | |||
+ | |||
+ | ==== Visualizing and exploring your data ==== | ||
+ | |||
+ | 1. With the groups calculated, the specific traces that have been extracted from the loaded library' | ||
+ | |||
+ | 2. Make sure the plot type is set to Signal-Time. | ||
+ | |||
+ | 3. Select the group GRF and select all workspaces. | ||
+ | |||
+ | 4. Open the [[Sift: | ||
+ | |||
+ | 5. Select "plot All Traces" | ||
+ | |||
+ | Your graph should now resemble the image below. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | |||
+ | 2. Use your cursor to select only lines on the graph that you wish to inspect. Click on single traces in order to examine individual curves without the other curves ' | ||
+ | |||
+ | {{: | ||
+ | |||
+ | |||
+ | ==== Verifying raw data and excluding traces ==== | ||
+ | |||
+ | Once the queried traces have been plotted, it is now possible to begin cleaning the data. In this example, in particular, it is evident that some incorrect forces assignments exist. | ||
+ | |||
+ | 1. In order to exclude these incorrect assignments, | ||
+ | |||
+ | 2. Right-click and navigate to Exclude, at the very bottom, and click Exclude Trace (raw data). | ||
+ | |||
+ | |||
+ | 3. Verify that your desired traces have been properly excluded. | ||
+ | |||
+ | 3.1 When traces are excluded, two notable differences should appear in the Queried Data subwindow. First, the previously selected, but now excluded data should not be visible on the graph. Second, the **Workspaces** widget will indicate when traces have been excluded. Specifically, | ||
+ | |||
+ | {{: | ||
+ | |||
+ | 4. If you want to visualize the data you have excluded alongside the remaining data, go to the [[Sift: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | |||
+ | 5.Traces can also be examined more closely before deciding to include or exclude them. | ||
+ | |||
+ | 5.1 Select the traces of interest by clicking and dragging on the plot. | ||
+ | |||
+ | 5.2 Click the {{: | ||
+ | |||
+ | 5.3 The Show [[Sift: | ||
+ | |||
+ | 6. Once all of the desired exclusions have been made, the original data is ready to be updated. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | |||
+ | ==== Updating the original data ==== | ||
+ | |||
+ | {{: | ||
+ | |||
+ | 1. Now that the incorrect force assignments have been identified in our data set, it is possible to update the original [[Visual3D: | ||
+ | |||
+ | 2. The **Excluded Traces** options allow the user to add a BAD event to the excluded traces. This option is helpful if the intention is to [[Visual3D: | ||
+ | |||
+ | 3. Alternatively, | ||
+ | |||
+ | 4. Select both **Add Event to Exclude Signals** with the default event name " | ||
+ | |||
+ | 5. This results in Sift opening each .cmz file in the background, adding a BAD event for the excluded traces, and removing the force assignments. This process can take some time depending on the .cmz size, but you can further explore your data (or get a coffee) as it is occurs. When the update is complete, the check mark icons will return beside each workspace in the **Workspaces** widget and the original data will be modified. | ||
+ | |||
+ | |||
+ | ==== Recap ==== | ||
+ | |||
+ | In this tutorial you learned how to use Sift as a data cleaning tool for quality assurance purposes. Here, the process of loading data, defining queries, excluding data, and updating the original data are described. | ||
+ | |||
+ | |||
sift/tutorials/clean_your_data.1720790858.txt.gz · Last modified: 2024/07/12 13:27 by sgranger