# Assignment 4: MDS

- Due Mar 8, 2016 by 9am
- Points 10
- Submitting a file upload
- Available Mar 1, 2016 at 9am - Mar 13, 2016 at 9am 12 days

## Multidimensional Scaling (MDS)

**CLPS 1291: Assignment 4**Due:

*on*

**9:00AM**

**Mar****8**Before you get started:

Look at the Assignment Guidelines for formatting and coding style information, submission guidelines, etc. If you have any questions related to the assignment, please post them in this Discussion.

**As a reminder, we will neither accept answers that fail to follow the given template, nor consider code written outside of the allotted space. We will only review functions that follow our conventions and results documents submitted in the requested form.**

You will need to use this skeleton code and text file to complete the assignment. See the skeleton code for more helpful guidelines.

We expect you to turn in the following:

- results.pdf - a pdf containing your outputs and descriptions.
- assignment4.zip - a zip file containing:
- assignment4.m - a MATLAB script containing all of the code necessary for this assignment

**1. Perceived Similarity of Colors**

Multidimensional scaling (MDS) can be used to represent the perceived similarity (or dissimilarity!) of items. Using this algorithm, we can use a set of dissimilarity ratings (i.e., pairwise distances between individual items) to reconstruct a plot representing the relative arrangement of items in psychological space. MDS assumes that, as perceptual similarity between a pair of items **increases**, their physical separation in psychological space **decreases. **Here, we will run MDS on a sets of dissimilarity judgments about colors and animals!

The specific instructions for this assignment can be found within the skeleton code. We give a general overview below. Please read the instructions carefully.

**1. a) Load color dataset**

In this section, you will load the provided MATLAB data. One of the datasets we will be using in this assignment is contained in a file called 'colors.mat'. This contains a structure with two fields:

- names = 1x14 cell array containing the wavelengths of 14 colors (To get an idea of what colors these wavelengths correspond to, look at the table at http://en.wikipedia.org/wiki/Color).
- dsim = 14x14 matrix of perceived dissimilarities between each color, as determined by human judgments. These values have been adjusted to be on a scale from 0 to 1. A value of 1 means that two elements are the most dissimilar, while a value of 0 means that they are identical.

**1. b) Visualize color dissimilarity matrix**

In order to visualize the dissimilarity matrix, you will need to use the following set of functions:

- 'figure': creates a new figure window
- 'imagesc': displays your data as a color map (with one pure color representing the highest value, another pure color representing the lowest value, and a gradient of in-between colors representing the in-between values).
- 'colorbar': makes sure a color scale (like a figure legend) appears next to your dissimilarity matrix.

Put all these commands together to plot your matrix! **Add this figure to 'results.pdf.'**

**1. c) Run Multidimensional Scaling**

In this section you will run MDS to try to recover people's psychological spaces. There is a function you can use to complete this task -- see if you can figure out what it is!

HINT: If you've done this correctly, the function should create a 10x2 matrix of values between -1 and 1. Make sure that you have this before trying to plot your results!

**1. d) Plot MDS Results**

Visualize the MDS results in a scatter plot that will act as a 2-dimensional representation of the dissimilarity between the colors.

What do you notice about the relative locations of these data points? Does the scatter plot remind you of anything (if so, what and why)? Add this figure and your response to the above questions to 'results.pdf.'

**1. e) Calculate Pairwise Distances by Hand**

Now, we need to compute the Euclidean distance between each pair of objects in our data matrix. The 'Euclidean distance' is the shortest straight-line distance that can be drawn between two points in 2D space. In order to compute this, we just need need to use the distance formula (same as the Pythagorian theorem).

Use this formula to calculate the pairwise distances between each point in your dissimilarity matrix.

We also want to time how long it takes Matlab to run these calculations. (See skeleton code for instructions.)

**1. f) Calculate Pairwise Distances Using 'pdist'**

In order to make our computation run faster, we want to compute the pairwise distances between our observations using the built-in function 'pdist' to generate a vector of the Euclidean pairwise distances between each datapoint in our scatter plot.

HINT: When using this function, make sure you are asking for the **Euclidean **distance! Use 'help' to check that you're doing this correctly. Time this as well.

After running the script, check the elapsed time in the command window. How does this compare to the time it took to compute the pairwise distances by hand? Why do you think this is? Add you response and the elapsed times from parts 1. e) and e. f) to 'results.pdf.'

**1. g) Create Distance and Similarity Matrices**

The 'pdist' function should output a row vector. However, we'll need to transform this vector into something we can plot more easily. (See skeleton code for instructions.)

**1. h) Experimental similarity measures vs. Recovered distances in psychological space**

Now, we can plot the experimental similarity measures (i.e., those obtained via human judgements) against the calculated distances between those same items in psychological space. Create a new figure, and use 'scatter' to generate another scatter plot.

REMEMBER: We are trying to plot **similarity** measures, but your experimental measures are currently in terms of **dissimilarity**! To go from a dissimilarity to a similarity matrix, you just need to subtract the former from 1.

What do you notice about the relationship between empirical measures of similarity and distance in psychological space? How might these results relate to Shepard's universal law of generalizations?Add this figure and your response to the above questions to 'results.pdf.;

**2. Perceived Similarity of Animals**

Now, we will essentially repeat all the steps we took with the color dataset, but using a dataset comparing the similarity of animals!

**2. a) Loading Animal Dataset**

The animal dataset is contained within a file called 'animals.mat'. This contains two saved variables (much like 'colors.mat'):

- names = 10x1 cell array containing the names of 10 animals.
- dsim = 10x10 matrix of perceived dissimilarities between each animal, as determined by human judgments. These values have also been adjusted to be on a scale from 0 to 1, just like the colors dsim matrix.

Load the file 'animals.mat' and save it as a variable, then save the fields within this structure as their own variables.

**2. b) Visualize Animal Dissimilarity Matrix**

Create a new figure that visualizes your animal dissimilarity matrix as a color map (make sure to include a colorbar!). Use the animal names to label your x and y axes, and give your figure a descriptive title.

**2. c) Run Multidimensional Scaling and Plot Results**

Use the same function as before to run MDS on the animal dissimilarity matrix.

Now, visualize the MDS results as a scatter plot (just like before!). Here, our scatter plot will act as a 2-dimensional representation of the dissimilarities between each animal.

Label each individual data point (again, use the 'text' function) and your x and y axes, and give your scatter plot a descriptive title.

Describe the relative arrangement of these data points. Do you notice any patterns or groups emerging (and if so, why do you think that is)? How does this plot compare with your color plot? Add this figure and your response to the above questions to 'results.pdf.'

**2. d) Create Distance and Similarity Matrices**

Here, you'll be repeating all the steps from 1. e), using your animal data. Generate a vector of the Euclidean pairwise distances between each datapoint in our scatter plot, and transform this vector (and your dissimilarity matrix) into vectors we can plot later.

**2. e) Experimental Similarity Measures vs. Recovered Distances in Psychological Space**

Again, we'll plot the experimental similarity measures against the calculated distances between those same items in psychological space. Create a new figure, and use 'scatter' to generate a scatter plot. Make sure to label your x and y axes and add a title! Remember, we want to plot **similarity** measures (not dissimilarity!).

Is this relationship similar to the one you observed with the color data? If you see any differences, why do you think they exist? Add this figure and your response to the above questions to 'results.pdf.'

**EXTRA CREDIT!**

This is your chance to run MDS on additional data sets! The following website contains lots of sets of similarity data you can choose from: http://faculty.sites.uci.edu/mdlee/similarity-data/

Pick a couple of data sets that you're interested in, and go through the same steps you did for the color and animal data above.

Fair warning: Some of these datasets can be a little messy! You'll need to be very careful in checking whether the matrices represent **similarity **or **dissimilarity**. If you aren't sure, visualize the data as a color map. If the diagonal of the matrix = 0, it's a **dissimilarity** matrix. If the diagonal = 1, it's a **similarity **matrix. If the matrix you have isn't in the form you want, subtract it from 1!

How well does the psychological space recovered by MDS seem to agree with your expectations for how the data should be organized? Are the points you think are similar located close together in the map? If not, why do you think that is? Is there a better way to represent the data you chose? Add all of your figures and responses to the questions above to 'results.pdf.'