Modern scanning transmission electron microscopes (STEMs) routinely produce very large datasets with a variety of signals, ranging from conventional integrated scattering (annular bright- or dark-field), to X-ray spectroscopy, electron energy loss spectroscopy, and, of particular interest in this study, localised diffraction. While the signal efficiency can be very high in STEM, the available information can be lost or neglected when using traditional data analysis techniques. However, the ever-increasing interest in (and availability of) “big data” and data mining technologies has led to a wealth of techniques suitable for processing STEM data in more intelligent and meaningful ways.
Data clustering represents one such technique. Clusters are groups of data points which exhibit similar features, such as scan pixels which have similar diffraction patterns. However, clustering is not straightforwardly applicable to diffraction data, which typically has a very large number of features, meaning that the “distance” metrics used in most algorithms work poorly.1 Moreover, standard clustering methods are only suitable where data are well-distinguished, which is not the case for diffraction data which often exhibit considerable overlap.
The first problem is most readily solved by taking advantage of recent work focusing on applying dimensionality reduction methods, such as principal component analysis (PCA) and non-negative matrix factorisation (NMF), to diffraction data. Alone, these methods are are capable of extracting relevant features from STEM data, but only in fairly ideal cases.2 However, these methods do preserve the essential structure of the data, allowing clustering to find those features which are actually well-related. The results can then be reprojected into higher dimensions for interpretable results. The second problem can be solved using fuzzy clustering methods, which allow data points to belong to several clusters simultaneously, under algorithm-dependent constraints.3
In this study, clustering has been applied to a number of real experimental datasets, proving to be capable of (a) accurately extracting the spatial location of unique sample orientations/phases, and(b) separating the unique diffraction signals from those phases.
(a) is achieved via “direct” clustering – the diffraction patterns at each scan pixel are compared, and similar patterns brought together. Figure 1 shows an example of this from a part of a NiFe sample that contains a number of different superstructures of the conventional cubic Ni structure. On the left are component “diffraction patterns” derived from NMF alone. On the right are patterns determined from clustering. The latter do not exhibit the incomplete summation artefacts typical of the NMF patterns, such as sub-background intensity or “doughnut” profiles, and are therefore significantly easier to interpret and associate physical quantities – note, for example, the clear presence of superlattice peaks in cluster 0. This may have useful consequences in, for example, pattern matching, or automatic separation of constituent phases.
(b) can be thought of as “inverted” clustering. Each pixel in diffraction space is associated with some real-space signal, and using clustering to group these together distinguishes diffraction spots which produce unique signals, as well as finding unique regions of the sample. Figure 2 shows the result of this method applied to a dataset acquired from a GaAs nanowire containing twin defects. Clusters 0 and 5 represent diffracted beams corresponding uniquely to each twin, and cluster 4 represents diffraction spots that are the same in either orientation. This method also overcomes the effect of local bending and thickness effects in the dataset that made automatic identification of the twin phases difficult. Clusters 1, 2, and 3 represent variation in the direct beam and background intensity.
These techniques are relatively straightforward to implement, rapid, and scale well with the size of the datasets. Ongoing work is focused on new experimental data, as well as algorithmic work to reduce the computational overhead associated with decomposing the data.
1Kailing, K., Kriegel, H., & Kröger, P. (2004). Density-connected subspace clustering for high-dimensional data. Proc. SDM. Retrieved from http://epubs.siam.org/doi/abs/10.1137/1.9781611972740.23
2Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–91. doi:10.1038/44565
3Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2-3), 191–203. http://doi.org/10.1016/0098-3004(84)90020-7
Figures:

Figure 1: Diffraction patterns of an iron-nickel meteorite sample. NMF decomposition factors left and cluster centres right. Note the improvement in ease of interpretation and reduction in doughnut artefacts, as well as the clarification of the superlattice peaks in cluster centre 5.

Figure 2: "Inverted" cluster memberships (left) and their respective centres (right). This method is able to distinguish which diffraction points are unique to each distinct twin. Note that the reflections common to both twins are also distinguished (cluster 4).
To cite this abstract:
Ben Martineau, Alexander Eggeman; Clustering for scanning transmission electron diffraction data. The 16th European Microscopy Congress, Lyon, France. https://emc-proceedings.com/abstract/clustering-for-scanning-transmission-electron-diffraction-data/. Accessed: December 2, 2023« Back to The 16th European Microscopy Congress 2016
EMC Abstracts - https://emc-proceedings.com/abstract/clustering-for-scanning-transmission-electron-diffraction-data/