Table of Contents
Using K-means to cluster kinetic features in above-knee amputees
Abstract
Day to day mobility of above the knee amputees is related to the design of their prosthetic. Although there are some studies about this underrepresented population's biomechanics, the kinetics during stand-up and sit-down movements (STS) have not been fully studied. Here we use K-Means to evaluate the kinetic behavior during STS extracted from Hunt et al.'s public dataset. We found that kinetic signals between the intact and prosthetic side were able to be clustered unsupervised, but it struggled to cluster across different brands of prosthetics. We conclude that there is minimal variance between participant’s kinetic behavior of their prosthetic leg regardless of the brand, however there is a difference between the intact and prosthetic side.
Data
Public Data Set
The publicly available dataset used in this project is credited to Hunt et al. of the paper "Open dataset of kinetics, kinematics, and electromyography of above-knee amputees during stand-up and sit-down". The dataset includes 3D kinematic and kinetic data for 9 above-knee amputees during stand-up and sit-down with their passive, microprocessor-controlled prostheses. The biomechanics were captured using a 12-camera motion capture system with two force plates and four EMG sensors on the intact lower limb.
In this tutorial, we will be using the whole dataset, which can be downloaded fromtheir original website, with the dataset name V3D_STS.zip. Within this folder, each participant has a workspace, for a total of 9 workspaces, and these workspaces can be directly loaded into Visual3D and Sift.
In Visual3D you can see the full body model that was built using a modified Plug-In gait model.
Kinetic Data
In this analysis we will focus on the Ground Reaction Force (GRF), Knee Joint Flexion/Extension Moment, and Hip Joint Flexion/Extension Moment for analysis, normalized to body weight. These link model based items were created using the pipelines available in the dataset, and you can find the pipeline files under the path “V3D_STS/V3D_Pipeline_Files”. Below is an example of the command line for generating Knee Joint Moment (Torque).
Compute_Model_Based_Data /RESULT_NAME=L_knee_torque /SUBJECT_TAG=ALL_SUBJECTS /FUNCTION=JOINT_MOMENT /SEGMENT=LSK /REFERENCE_SEGMENT= /RESOLUTION_COORDINATE_SYSTEM=LTH ! /USE_CARDAN_SEQUENCE=FALSE /NORMALIZATION=TRUE /NORMALIZATION_METHOD=DEFAULT_NORMALIZATION ! /NORMALIZATION_METRIC= ! /NEGATEX=FALSE /NEGATEY=TRUE /NEGATEZ=TRUE ! /AXIS1=X ! /AXIS2=Y ! /AXIS3=Z ! /TREADMILL_DATA=FALSE ! /TREADMILL_DIRECTION=UNIT_VECTOR(0,1,0) ! /TREADMILL_SPEED=0.0 ;
Methods
Visual3D Processing
Although the data set provides completed .cmz workspaces, there was some processing that was required to prepare for analysis.
1. Download V3D_STS.zip
2. Add Tags for each workspace: Each workspace indicates a single participant. Since there are only 9 participants in this dataset, we will add tags manually. The table shown below is from the original paper, which includes all the demographics and details of the participants. In this tutorial, we will add tags for these two columns: “Prosthesis Side” and “Knee Prosthesis”.
- Load workspace: In Visual3D, select File → Open/Add…, and go to the V3D_STS folder downloaded previously. Open a workspace. As an example, open the workspace for the TF01 participant: 20210916_TF01_STS.cmz. You should see the workspace being loaded as following:
- Add tags: Click Add New File Tag, and you will see a pop up window Enter the new file tag. According to the table, TF01 has “Prosthesis Side” as “L”, so we enter “Left” in the pop up window, and click Continue». Similarly, since TF01 is using “C-Leg” as “Knee Prosthesis”, we will add another tag named as “CLeg”. Make sure both tags are checked.
- Note that there should be no hyphens in the tag!
Now that we have added the tags for TF01, we can repeat the same steps for the other workspaces.
Loading Data into Sift
Within Sift we can visualize and analyze the data.
Select Load Library and click on Browse to go to the V3D_STS folder where all 9 workspaces with updated tags are located. Then select Load and exit the window. Make sure you see the tags showing up on the screen. You could also click on the + button on the left side of each workspace to see whether the corresponding tags are being checked.
Build Queries in Sift
In our analysis we have two main objectives:
- To compare intact side vs prosthetic's side kinetic signals.
- To compare the the three Knee Prosthetic brands (i.e. C-Leg, Rheo, Plie) kinetic signals.
To do so we will focus on Ground Reaction Force, Knee Joint Moment in Flexion/Extension, and Hip Joint Moment in Flexion/Extension.
Queries for Objective 1
We will build queries for Ground Reaction Force as an example, but the steps and logistics for Knee Joint Moment, and Hip Joint Moment are the same.
Open the Query Builder icon on the Explore Page or on the toolbar to prompt the Query Builder Dialog.
Create a new query with Query Name: GRF_intact. This would be the query for all participants' intact-side. Within this query, create the following conditions:
1. Condition Name: left prosthesis CLeg.
- This condition includes all participants who's left leg is the prosthetic leg, AND using the C-Leg knee prosthetic brand.
- In the Signals tab, select the following settings:
- Since this participant's left leg is the prosthetic, the GRF of the intact leg would be the RIGHT leg. Thus, R_GRF is selected as the Signal Name.
- Since the Z-axis is the vertical axis in the coordinate system in this study, we select Z for the Component, as the ground reaction force is in the vertical direction from the floor up to the leg.
- In the Events tab, within the All Events block, you should see two events: Foot Off and Foot Strike. Select Foot Off and click the > button to add it into Event Sequence block. Then select Foot Strike and click the > button to add it into Event Sequence block. If you accidentally selected the wrong event, you can also click the < button to move the event from Event Sequence back to All Events block. You can also use the Up and Down buttons to move the event correspondingly in the sequence.
- In the Refinement tab, check Refine using tag and Use AND Logic, and then select the tags CLeg and Left. Then click Save.
2. Condition Name: right prosthesis CLeg.
- This name indicates that the participant has the right leg as the prosthesis, so the left leg is the intact leg.
- Signals tab
- Events tab: Same as previous.
- Refinement tab:
- Click Save.
Up to this point, we have successfully created the conditions for C-Leg. The same steps are repeated for Plie and Rheo, and the only difference would be to select the corresponding tags for the different brands of prosthetics.
After creating the GRF_intact query, we can then use the same logic as above to create the GRF_prosthesis query. Note that in this case, when we are creating for example left prosthesis CLeg, we select L_GRF for the Signal Name, because the left leg would be the prosthetic leg.
The queries for Hip Moment and Knee Moment have the same logic as above.
Queries for Objective 2
For this objective, we are trying to compare the kinetics data of the prosthetic leg across different brands. Thus, we will create queries specifically for each brand: C-Leg, Plie, and Rheo. For illustration purpose, we will take C-Leg as and example, but idea would be the same for the other two brands as well.
- In the Explore Page, click on the Query Builder icon.
- Create a new query, and set Query Name as “CLeg GRF_prosthesis” to indicate that this query is for participants who use C-Leg as their prosthetic leg, and we will focus on the Ground Reaction Force of the prosthetic legs.
- Within this query, create two conditions:
1. A condition named as left prosthesis for participants with left leg as prosthetic.
- Signals :
- Type: LINK_MODEL_BASED
- Folder: ORIGINAL
- Signal Name: L_GRF
- Component: Z
- Events Sequence is Foot Off followed by Foot Strike
- Refinement:
- Refine using tag: checked
- Use AND Logic: checked
- Tags: CLeg, Leg
- Click Save
2. A condition named as right prosthesis for participants with right leg as prosthetic.
- Signals :
- Type: LINK_MODEL_BASED
- Folder: ORIGINAL
- Signal Name: R_GRF
- Component: Z
- Events Sequence is Foot Off followed by Foot Strike
- Refinement:
- Refine using tag: checked
- Use AND Logic: checked
- Tags: CLeg, Leg
- Click Save
Now we have built the C-leg query for GRF, we can also do the same for Knee Joint Moment and Hip Joint Moment. Note that the Component should be set to X for Knee Joint Moment and Hip Joint Moment, because the rotations of these joints are relative to the X-axis rather than Z-axis for GRF.
After all queries are created, click on Calculate All Queries at the bottom of the Query Builder Dialog.
Visualizing Data in Sift
Before going into the analysis we can visualize the data to see if we can identify any patterns between groups.
As an example, we can quickly visualize the GRF difference between intact sides vs.. prosthetic sides:
- Go to the Explore page
- Select both GRF_intact and GRF_prosthesis in the Groups block by pressing Ctrl, and then check Select All Workspaces.
- Check Plot Group Mean and Plot Group Dispersion.
- The plot that looks like the following would be generated:
The X-axis is the normalized time points, with the following events in the corresponding ranges:
- Standing Up: Point 0 ~ 40
- Standing: Point 40 ~ 60
- Sitting Down: Point 60 ~ 100
Once we are happy with the data that is visualized we can go ahead with the analysis.
PCA Analysis
Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space to reveal patterns and visualize variations within the data. In this project, we will show how to perform PCA to visualize the variations of kinetic features between intact leg and prosthetic leg, as well as across the three prosthetic brands. The PCA scores will then be used to perform K-means clustering.
The following PCA results are generated for further analysis.
For Objective 1
- Groups: GRF_intact, GRF_prosthesis
- PCA Name: GRF_all_together
- Number of PCs: 4
- Use Workspace Mean: Unchecked
- Groups: Hip_Moment_prosthesis, Hip_Moment_intact
- PCA Name: Hip_Moment_all_together
- Number of PCs: 4
- Use Workspace Mean: Unchecked
- Groups: Knee_Moment_intact, Knee_Moment_prosthesis
- PCA Name: Knee_Moment_all_together
- Number of PCs: 4
- Use Workspace Mean: Unchecked
For Objective 2
- Groups: CLeg GRF_prosthesis, Plie GRF_prosthesis, Rheo GRF_prosthesis
- PCA Name: GRF Prosthesis
- Number of PCs: 7
- Use Workspace Mean: Unchecked
- Groups: CLeg Knee Moment prosthesis, Plie Knee Moment prosthesis, Rheo Knee Moment prosthesis
- PCA Name: Knee Moment Prosthesis
- Number of PCs: 6
- Use Workspace Mean: Unchecked
- Groups: CLeg Hip Moment prosthesis, Plie Hip Moment prosthesis, Rheo Hip Moment prosthesis
- PCA Name: Hip Moment Prosthesis
- Number of PCs: 7
- Use Workspace Mean: Unchecked
K-means Analysis
K-means is an unsupervised clustering technique that groups similar datapoints together to form a “cluster” based on the Euclidean distances between points. In our project, we will use K-means clustering to answer our two objects:
- Investigating whether K-means can find the clusters for “Intact” and “Prosthetic” based on the PC scores of kinetic data.
- Investigating whether K-means would be able to cluster the three prosthetic brands based on GRF.
The following shows the settings for performing K-means in Sift.
Settings for Objective 1
- Go to the Analyse page, select the PCA Results which you would like to perform K-means analysis on.
- From the toolbar, click the Outlier Detection Using PCA icon and select K-means from the drop-down menu.
- Use the following settings to perform K-means:
Settings for Objective 2
- Go to the Analyse page, select the PCA Results which you would like to perform K-means analysis on.
- From the toolbar, click the Outlier Detection Using PCA icon and select K-means from the drop-down menu.
- Use the following settings to perform K-means. Note that the K-means setting is the same for all cases, we just need to change “Number of Clusters” parameter correspondingly.
Results / Discussion
Visualizing Data in Sift
Comparison of Ground Reaction Force of Intact side v.s. Prosthesis side
We can see the GRF of the intact side is much greater than prosthetic side, indicating people more weight on it when standing up and sitting down.
Comparison of Knee Moment of Intact side v.s. Prosthesis side
Comparison of Hip Moment of Intact side v.s. Prosthesis side
The above three figures all shows the same commonality: There is a noticeable variation in Ground Reaction Force, Knee Joint Moment and Hip Joint Moment between intact and prosthetic legs in the standing-up and sitting-down motions. This is especially obvious for Ground Reaction Force, where the standard variations of the two curves barely overlap.
Ground Reaction Force of Prosthesis side across three brands
Knee Moment of Prosthesis side across three brands
Hip Moment of Prosthesis side across three brands
While comparing the joint moments across three prosthetic brands, the variation is not that obvious. We can see that there is a lot of overlap of standard deviation across three brands.
PCA Analysis
PCA results can provide a straightforward visualization on the variations between groups within data. By understanding these variations, it is helpful also in the downstream analysis where we perform K-means based on PCA results to investigate whether K-means would naturally identify the clusters in an unsupervised manner by taking these variations into account, which is also called “similarities” between data points or groups.
For interpretation, we only take Ground Reaction Force as an example.
Variance Explained
Our 4 principal components explain 81.3% of the dataset's variability, and Principal Component 1 explains 59.3% of the variation in the original dataset. In practice, it is recommended to have 90% to 95% variability explained by increasing the number of principal components when performing the analysis, but for demonstration purpose only, we can continue the exploration.
Group Scores
Since we want to use PCA to distinguish between groups, we will look at the group scores to find the PC that explains the most variance between groups. In our case, this is PC1, where we can see a big separation between the two groups.
Loading Vector
Loading vector can explain which parts of the curve have variance. Any regions that are greater or less than zero means that there is a lot of variance that is being explained by that PC. In this figure, we plot the loading vector for PC1, and we see that the whole curve is above zero, which supports that majority of the variance is explained by PC1.
Extreme Plot
The Extreme Plot allows us to visualize the principal components in the range of Mean +/- 2 Standard Deviations to get some further insight into our problem. We see amplitude variance by the two standard deviations primarily around the 20% and the 80% of the time stamps, which are the “standing up” and “sitting down” motions.
Workspace Scores
Workspace scores can help us visualize the variation of the dataset across each principal components, and to also see whether there is a clear separation between groups. From this plot, the points are colored based on workspace (i.e. each color indicates a single participant, and each dot represents a single trace of that participant). We see that each participant has data points on both sides of the PC1 = 0 vertical line, which illustrates the variation between intact side versus prosthetic side of the participant, and we can also see the data points cluster on both sides.
K-means Analysis
Applying K-means in this project allows us to explore whether any natural structure in the data might align with known differences between groups. We compare whether the clusters identified using K-means aligns with groups we have created. To do this, on the toolbar, click the Show Data Options icon, and in the pop up window, select Cluster in the Display Styles From… block to color the data points based on K-means cluster labels, and select Group to color the data points based on the real group labels.
Objective 1: Intact v.s. Prosthetic
GRF
Group Labels v.s. K-means Clusters
We first compared the Ground Reaction Force between the intact leg and prosthetic leg, and evaluated K-means performance on differentiating the two groups. We colored the data points based on real labels as shown in the left figure below, where green is the intact side, and yellow is the prosthetic side. The K-means predicted labels are shown in the right figure below, with red and blue to represent the two clusters identified.
The incorrectly clustered data points are circled, and a boundary line is drawn to separate the two clusters for better visualization.
(Old version: We can see that K-means performs very well for differentiating the intact side and the prosthetic side given GRF. This supports that there is a noticeable difference in GRF between the two groups, such that the similarity of the GRF for intact legs is higher than to the prosthetic legs. This similarity is captured by the close Euclidean distance between data points within the same group, and identified by K-means to form a cluster.)
We observe that K-means clustering effectively identified two clusters based on Ground Reaction Force (GRF) data, and by comparing to the real group labels, we found that the clustering labels aligns well to the intact group and the prosthetic group. While K-means operates without any prior knowledge of group labels, its ability to uncover this separation suggests that there are meaningful differences in GRF patterns between the two groups. Specifically, the GRF profiles of intact legs appear more similar to each other than to those of prosthetic legs. This intra-group similarity is reflected in the relatively small Euclidean distances between data points within each group, which K-means captures to form distinct clusters. The alignment of these clusters with real group labels supports that GRF contains discriminative features relevant to the intact vs. prosthetic distinction.
Knee Joint Moment
Group Labels v.s. K-means Clusters
We then compare the Knee Joint Moment between the intact side and the prosthetic side to see whether K-means performs similarly. Here the left figures are the real labels of the two groups, where green is the intact side, and yellow is the prosthetic side. On the right shows the two clusters generated by K-means.
Data points that are incorrectly clustered are circled:
We first observed that in real labels, the data points for prosthetic sides are significantly more densely grouped together, whereas the data points for intact side are more sparsely plotted. This indicates that there are larger variations of the knee joint moment for the intact leg across different participants, but the knee joint moment are more similar in the prosthesis regardless of participants weight or prosthetic brands. This similarity can be explained by that the prosthetic legs are typically operated within a limited and predefined range of mechanical behavior. They are designed to mimic an average gait, and don’t have the same neuromuscular control or adaptability as a biological knee. Thus, we are seeing the prosthetic knee joint moments grouped more densely, as the knee joint is being directly contacted and affected by the prosthetic leg. Secondly, we observed that the majority of data points are assigned cluster labels that align well with their original group labels. Only a small number of data points are assigned to the opposite cluster, where we circled them out in the plot. This suggests that K-means considered them more similar to the other cluster rather than their own cluster.
Hip Joint Moment
Similarly, we looked at the Hip Joint Moment. In the left figure, the green group represents the prosthetic side, and the yellow group represents the intact side.
Group Labels v.s. K-means Clusters
Data points that are incorrectly clustered are circled, and a boundary line is drawn:
Contrary to what we observed for Knee Joint Moment, here the data points for the prosthetic side are more sparse than the data points for the intact side. This reflects how people use their hips to compensate differently for the loss of knee function in the prosthetic side. For example, hip joint flexion-extension rotation on the prosthetic side may take on extra load to initiate swing, stabilize during stance, and control limb advancement. Since participants vary in factors such as weights and muscle strength, they compensate differently based on comfort, prosthetic alignment, training, and mobility goals, and thus hip joint moments vary more widely on the prosthetic side. Then, by performing K-means, it shows that the the clusters align well with the original group labels, with only three data points considered to be more similar to the opposite cluster, as we circled out in the plot. This result again supports the claim that there is a noticeable variation of hip joint moments between the intact side and the prosthetic side.
Objective 2: Across Three Prosthetic Brands
In this section, we took a closer look specifically into the GRF data, as this is the kinetic signal that showed the largest variations between intact and prosthetic side, and thus we are interested in whether this signal also differ across prosthetic brands. We performed K-means with different parameter settings to investigate whether K-means would be able to differentiate the three brands of prosthetic legs.
GRF
Group Labels v.s. K-means Clusters (K = 2)
First, we observe from the overall structure that all three groups are showing similar grouping densities, which indicates the ground reaction force differences across participants are minimal, regardless of the prosthetic brands. However, there is some variations of GRF between C-Leg(Purple) and Plie(Green), as shown on the two sides of PC1 = 0 vertical line, but also some overlap of the three groups in the middle of the plot. We then performed K-means with only 2 clusters, and we see that the pattern aligns to the original groups of C-Leg and Plie. For the Rheo group, we can see that the data points to the left of the PC1 = 0 vertical line are considered as Cluster 1(Red), whereas those to the right of the PC1 = 0 vertical line are identified as Cluster 2 (Blue). Thus, this result supports that there is some variation in the ground reaction force between C-Leg and Plie, but the separation is not clear.
Then, we also performed K-means with 3 and 4 clusters.
Group Labels v.s. K-means Clusters (K = 3)
We observe that Cluster 1 and Cluster 2 generally align with the C-Leg and Plie groups respectively to some extent. Notably, the K-means clustering result shows particularly strong alignment for the Plie group: as seen in the right plot, Cluster 2 (blue) consists exclusively of data points originally from the Plie group. However, within the PC1 range of approximately [-0.5, 0.5], where the original data points from all three brands are intermixed, K-means assigns them to a single cluster rather than separating them. This outcome is consistent with the underlying data structure, as these points are more similar to each other than to those in the outer regions, leading to smaller Euclidean distances and thus their assignment into the same cluster.
Group Labels v.s. K-means Clusters (K = 4)
On the left side of the PC1 = 0 vertical line, we observe that Cluster 4 (red), as identified by K-means, consists almost entirely of data points originally from the C-Leg group (as shown by the red points in the left figure). In the region where PC1 ranges from approximately [-0.5 to 0.5], where data points from all three brands are intermixed, K-means assigns them to a single cluster (Cluster 1, light green), likely due to their higher similarity to one another in that space. Meanwhile, Cluster 2 (blue) consists exclusively of data points from the Plie group, consistent with earlier patterns. These observations together support the conclusion that variation in ground reaction force across the three brands is relatively minor, as indicated by the overlapping data points and K-means’ tendency to group them into the same cluster.
Although the K-means clustering results for Knee Joint Moment (KJM) and Hip Joint Moment (HJM) across the three brands are not shown in this tutorial, the Visualization in Sift section indicates substantial overlap in the KJM and HJM curves among brands. Given this overlap, we wouldn’t necessarily expect K-means, as a fully unsupervised method, to separate the brands cleanly based on these features. If the clustering were to correspond well with brand labels, it would suggest that KJM or HJM captures some underlying distinctions between the brands. Conversely, a lack of clear separation doesn't imply failure, but rather reinforces that these joint moment features alone may not strongly differentiate the brands in an unsupervised setting.
Conclusion
While K-means clustering effectively distinguished the intact and prosthetic sides, it was not able to differentiate across three prosthetic brands. The results support that there is minimal variation in the prosthetic legs' kinetic behavior across the brands, but notable differences exist between the intact and prosthetic sides. There are some limitations in our project. Firstly, the sample size is very small, of only 9 participants, so we didn’t use workspace mean for PCA analysis, but rather used all traces. Secondly, there is modification of the prosthetic-side shank segment in the original dataset to better account for mass and inertial differences. However, this adjustment may not fully represent real-world dynamics and kinetics.