The UMAP figure here will be generated by using ONLY samples uploaded on the first tab [Prediction] or the default built-in testing dataset after clicking [Predict with Random forest]. The UMAP plot reflects actual bacterial abundance within each sample of the batch. Predicted values are not used as variables in the plot but only for coloring purposes as metadata.
The parameters on the left control the shape of the UMAP plot. These parameters are arbitrarily chosen and are likely to have the most significant effect on the embedding shape.
* Color by:
that column from the predicted metadata. Note that column will be used for rain cloud plot when you select dots on UMAP with lasso tool.
* UMAP: Number of Neighbors:
control local structure: how many samples sample considered per cluster. Integer must be more than 1. This usually effect how many main sample groups you can resolve with bacterial profiles.
* UMAP: Local Connectivity:
how many connection each sample must have: minimum 1. That parameters usually linked with sample cohesivness and cluster area ther bigger number usually return much more spreaded structures.
* UMAP: Minimum distance:
is a value between 0 and 1. This number affects the trade-off between local detail and global structure in the resulting embedding. If you want to explore the fine structure within clusters, a lower number may be preferable. If you are interested in understanding the global arrangement of data, a higher number can help.
For an explanation of UMAP hyperparameters, refer to this
blog
written by Andy Coenen and Adam Pearce which you may find informative.
Australian Microbiome integration
Australian Microbiome Initiative is an Australian Government funded project to catalogue microbiome data and accompanied metadata gathered across various regions of Australia. To learn more about protocols, metadata methods, and raw data generation, visit the official Australian Microbiome Initiative website by clicking on the picture on the left.
The bacterial profiles from AMI samples were generated using the AGRF DivPro pipeline, will be merged with either a built-in test dataset or an uploaded dataset. The combined dataset will be plotted using UMAP and linked to a geographical map, where the coordinates for the test data are predicted. In this visualization, new batch (test) samples are represented with the usual position icon (a hot air balloon shape), while AMI samples are displayed as semi-transparent circles. You can use the UMAP plot to select clusters of nearby samples to see how they align on the geographical map and gather metadata for the selected samples in the linked interactive table. You can use rectangle or lasso selection tools for this purpose.