sift:tutorials:run_k-means
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
sift:tutorials:run_k-means [2024/06/19 13:55] – created sgranger | sift:tutorials:run_k-means [2024/11/05 14:58] (current) – wikisysop | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | The k-means clustering algorithm is a commonly used method for grouping //n// individual data points into //k// clusters. It is a multi-variate statistical analysis that reduces the high-dimensional matrix of correlated, time-varying signals into a low-dimensional and statistically uncorrelated set of principal components (PCs). These PCs explain the variance found in the original signals and represent the most important features of the data, e.g., the overall magnitude or the shape of the time series at a particular point in the stride cycle. The value of each particular subject’s score for the individual PCs represents how strongly that feature was present in the data. | + | ====== Run K-Means ====== |
- | ===== The utility of clustering | + | The k-means |
- | When analysing biomechanical signals, we often realize that a number | + | ==== The utility |
- | ===== Tutorial Overview ===== | + | When analysing biomechanical signals, we often realize that a number of individual traces are similar. It can be useful to describe these traces as belonging to the same group, or cluster. This potentially allows us to simplify our analysis or to pick a single trace as being " |
- | This tutorial works off the Principal Component Analysis | + | ==== Tutorial |
- | ===== Running | + | This tutorial works off the [[sift: |
- | {{sift_new_kmeans.png}} | + | ==== Running a K-Means Test ==== |
- | | + | {{: |
+ | |||
+ | | ||
- Select K-Means in the dropdown. | - Select K-Means in the dropdown. | ||
- Change the number of clusters to the correct number for your analysis. This can be an iterative approach, by conducting the K-means analysis multiple times until you are happy with the output. For this example we are going to stick to 2. | - Change the number of clusters to the correct number for your analysis. This can be an iterative approach, by conducting the K-means analysis multiple times until you are happy with the output. For this example we are going to stick to 2. | ||
Line 23: | Line 25: | ||
Once K-Means clustering is completed, in the dialog, you will be provided with a list of summary information on each cluster, including cluster center, cluster radius, and the workspaces that get grouped into each cluster.\\ | Once K-Means clustering is completed, in the dialog, you will be provided with a list of summary information on each cluster, including cluster center, cluster radius, and the workspaces that get grouped into each cluster.\\ | ||
- | {{kmeansDlg.png}} | + | {{:kmeansDlg.png}} |
- | ===== Viewing K-Means Results | + | ==== Viewing K-Means Results ==== |
- | {{DataDlg.png}} | + | {{: |
Once you have run your K-Means Test and taken a brief look at the cluster' | Once you have run your K-Means Test and taken a brief look at the cluster' | ||
- | - Open up the {{sift_data_options.png}} **Data Options** dialog. | + | - Open up the {{:sift_data_options.png}} **Data Options** dialog. |
- In the top right corner under **Display Styles From...** make sure that **Cluster** is selected. | - In the top right corner under **Display Styles From...** make sure that **Cluster** is selected. | ||
- From the **Data Options** dialog you are also able to change the color or style of each cluster. Select **Clusters** in the **Edit Styles From** list on the left and play around with editing the styles of each cluster. | - From the **Data Options** dialog you are also able to change the color or style of each cluster. Select **Clusters** in the **Edit Styles From** list on the left and play around with editing the styles of each cluster. | ||
- Navigate to the **Analyse** page and select the **Workspace Scores** tab in your PCA results. | - Navigate to the **Analyse** page and select the **Workspace Scores** tab in your PCA results. | ||
- | Looking at the workspace tab we can select different points and the group and file will be displayed. This allows us to view which data points in a cluster belong to what group. We can clearly see the data points split into to clusters, blue and red, with somewhat of a separation. | + | Looking at the workspace tab we can select different points and the group and file will be displayed. This allows us to view which data points in a cluster belong to what group. We can clearly see the data points split into to clusters, blue and red, with somewhat of a separation. |
- | {{WorkspaceScores.png}} | + | {{: |
A K-means test finds the similarity between data points and groups them together into clusters. If you had two groups that were vastly different, the clusters would not have mixed groups. If the data points between groups have similarities the clusters may have data points from different groups. | A K-means test finds the similarity between data points and groups them together into clusters. If you had two groups that were vastly different, the clusters would not have mixed groups. If the data points between groups have similarities the clusters may have data points from different groups. | ||
Line 44: | Line 46: | ||
If we look at the results from this K-Means we can see that the clusters are not a perfect representation of each group, signifying that there is some overlap and similarities between groups. The graph on the left shows the groups split up with the osteoarthritis group in purple and the normal group in green. The graph on the right shows the two clusters. If we look at the red cluster it seems to be mainly the osteoarthritis group, and if we look at the blue cluster it seems to be mainly the normal group. The points circled in red show some osteoarthritis datapoints in the second cluster, again indicating some overlap. | If we look at the results from this K-Means we can see that the clusters are not a perfect representation of each group, signifying that there is some overlap and similarities between groups. The graph on the left shows the groups split up with the osteoarthritis group in purple and the normal group in green. The graph on the right shows the two clusters. If we look at the red cluster it seems to be mainly the osteoarthritis group, and if we look at the blue cluster it seems to be mainly the normal group. The points circled in red show some osteoarthritis datapoints in the second cluster, again indicating some overlap. | ||
- | {{sift_groups_clusters.png}} | + | {{:sift_groups_clusters.png}} |
- | ===== Reference | + | ==== Reference ==== |
The k-means clustering algorithm is more than 50 years old and is described in almost every textbook on data analysis and machine learning. Sift specifically implements the k-means++ algorithm, which optimizes how the initial cluster centres are chosen. | The k-means clustering algorithm is more than 50 years old and is described in almost every textbook on data analysis and machine learning. Sift specifically implements the k-means++ algorithm, which optimizes how the initial cluster centres are chosen. |
sift/tutorials/run_k-means.1718805324.txt.gz · Last modified: 2024/06/19 13:55 by sgranger