Statistical results from the People Matching Project Friend Pairs Survey #2

[Updated 8 March 2018]

This page has the documentation for the statistics files derived from the survey. For theory and everything else, see the main page of the People Matching Project research program.

The first friend pairs survey ran May to July 2016 as part of the effort to develop an open-source interpersonal comparability algorithm. People were recruited people to sign up and take the survey, then get one of their best friends to sign up and take and the same survey, with the two surveys being linked to together so that what questions best friends are most similar on can be determined.

In total, 1372 copies of the survey were submitted but it seems many people were apparently never successful in getting in their friend to take to the linked survey so the yield was only 325 complete pairs (650 surveys).

The main body of the survey contained 160 items (e.g. "I love to party") that were rated on a five point scale of agreement (1=Disagree, 5=Agree). In addition to the main body, each person answered demographic questions and questions about their relationship with their friend (e.g. length of acquaintance). For the averages of these questions, broken down by the gender configuration of the pairs, see PairedSurvey2-friendship-demographics.csv.

The text of the items, and their basic statistical properties, can be downloaded from PairedSurvey2-items.csv The main metric of item quality is the correlation between pairs. This is computed as Pearson's r where one column is all the responses from the first member of each pair and the second column is all the responses from the second members of each pair, and each row is a pair (so a correlation of 1.0 for a question would imply that all pairs of friends gave perfectly matching answers). This file also contains the mean and standard deviations of individual responses to each question. The number of pairs each correlation is calculated from varies slightly due to missing data (skipped questions).

More that just the correlation between pairs on each item, it is useful to know the correlation between an item and all other items across pairs. This can be used to find underlying structure in similarity between friends (i.e. some questions must be redundant). This file should be interpreted as Person A's answer to question X correlated with Person B's answer to question Y. You will notice that this table is not symmetrical, X correlated with Y is not the same as Y correlated with X. These values should be theoretically the same (as who is the first member of each pair is arbitrary), but due to sampling error they differ in this data. In the case of missing data for an question, subjects were omitted from the calculation of the correlations for the questions they were missing only (pairwise deletion); resulting in each correlation being based on 422 - 435 observations. The correlations across pairs can be downloaded at PairedSurvey2-cross-correlations.csv.

These standard (within subject) correlations for the items may be useful for providing context for the between subjects correlations. These are calculated from all persons who submitted the survey. Some subjects missed some questions. In the case of missing data for an question, rows were omitted from the calculation of the correlations for that question only. In the end, these correlations we calculated from between 1337 - 1363 observations. The simple correlations can be downloaded at PairedSurvey2-simple-correlations.csv.

The raw, subject level data is not available for public download due to unusually high de-anonymization risk for a survey (from the pairing information).

Copyright information

Creative Commons License This page and the linked data files are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.