A Comparison of Supervised and Unsupervised Learning

A common question in materials informatics that arises when choosing data science tools is

When can I use supervised learning (e.g. Linear Discriminant Analysis)?

This post was inspired by a Github Issue.

The Experiment

The Datasets

There are 40 micrographs who’s locations are site.data['set*'].

Feature Identification

  • Segment the images

    The segmentation procedure happens in readsets.m.

Feature Encoding

  • Compute the cross-correlation for the segmented image
  • Truncate the statistics to a cuttoff of $|100|$ pixels.

Data Analytics

Three tests are executed.

  1. PCA on the expected value of the cross-correlation over the SVE’s
  2. PCA on each of the SVE’s as independent realizations of each other.
  3. LDA on each of the SVE’s

    The LDA supervisors are applied to SVE that come from the same image.

Visualization

The interactive visualization tools are made using d3.js.

  • The smaller, more opaque points indicate individual SVE’s
  • The large points are the centroids of each SVE’s embedding for ONE image
  • Mouseover the large points to highlight its SVE’s and parent image.

PCA of the cross correlations

  1. Cross-correlation are computed for all segmented images.
  2. The CrossCorr was truncated to -100..100 or a 200x200 window.
  3. PCA was performed on the truncated statistics.

Reduced Embedding Visualizer

MouseOver any point to reveal pertinant information. Each point can be clicked to remind yourself which ones you have visited.

PCA of cross correlations in SVE’s

  1. 9 $200^2$ SVE’s were extracted from each image.
  2. Segmentation was performed
  3. The cross correlations as computed on each from -100..100
  4. Unsupervised principal component analysis was computed over for each SVE.

The Embedding

  • The small points indicate each individual SVE.
  • The color classifies like points.
  • The large nodes and the centroids of the small points. They represent an ensemble of the image statistics.
  • When a centroid is hovered over, it’s SVE’s will be highlighted.

Reduced Embedding Visualizer

MouseOver any point to reveal pertinant information. Each point can be clicked to remind yourself which ones you have visited.

LDA of cross correlations in SVE’s

  1. 9 $200^2$ SVE’s were extracted from each image.
  2. Segmentation was performed
  3. The cross correlations as computed on each from -100..100
  4. Supervised LDA was computed over for each SVE. The categories for the SVE’s are defined by the image index they come from. SVE’s from different images are different classes.

Reduced Embedding Visualizer

MouseOver any point to reveal pertinant information. Each point can be clicked to remind yourself which ones you have visited.