0  = £0.00 Including VAT
Excluding delivery

An A-Z of Psychometric Testing Terminology


  • LinkedIn
  • Facebook
  • Google Plus
  • Twitter

Anisa Zulfiqar of TalentLens Australia provides an A-Z of psychometric testing terminology.

Most of us will have come across these terms during our BPS Test User Ability and Personality training. The below glossary is intended to act as a reference tool and to assist with your interpretation of psychometric assessments. 


A combination of abilities and other characteristics, whether innate or acquired, which are indicative of an individual’s ability to learn or to develop proficiency in some particular area if appropriate education or training is provided. 

Aptitude tests include those of general academic ability (commonly called mental ability or intelligence tests); those of special abilities, such as verbal, numerical, mechanical, or musical; tests assessing “readiness” for learning; and prognostic test, which measure both ability and previous learning and are used to predict future performance – usually in a field requiring specific skills, such as speaking a foreign language, taking shorthand, or nursing.



A group of several tests standardised on the same population so that results on the several tests are comparable. Sometimes applied to any group of tests administered together, even though not standardised on the same subjects.



The relationship between two sets of scores or measures – For example, the tendency of one score to vary concomitantly with the other.

The existence of a strong relationship (i.e. a high correlation) between two variables does not necessarily indicate that one has any causal influence on the other. Correlations are usually denoted by a coefficient; the correlation coefficient most frequently used in test development and educational research is the Pearson or product-moment r. 

Unless otherwise specified, “correlation” usually refers to this coefficient. Correlation coefficients range from –1.00 to +1.00; a coefficient of 0.0 (zero) denotes a complete absence of relationship. Coefficients of –1.00 or +1.00 indicate perfect negative or positive relationships, respectively.


Normative data (norms)

Statistics that supply a frame of reference by which meaning may be given to obtained test scores. Norms are based upon the actual performance of individuals in the standardisation sample(s) for the test.

The most common types of norms are deviation IQ, percentile rank, grade equivalent, and stanine. Reference groups are usually those of specified occupations, age, grade, gender, or ethnicity.


Percentile (P)

A point (score) in a distribution at or below which fall the percentage of cases indicated by the percentile. Thus a score coinciding with the 35th percentile (P35) is regarded as equalling or surpassing 35% of the persons in the group, such that 65% of the performances exceed this score. “Percentile” does not mean the percent of correct answers on a test.

Use of percentiles in interpreting scores offers a number of advantages: percentiles are easy to compute and understand, can be used with any type of examinee and are suitable for any type of test. 

The primary drawback of using a raw score to-percentile conversion is the resulting inequality of units, especially at the extremes of the distribution of scores. For example, in a normal distribution, scores cluster near the mean and decrease in frequency the farther one departs from the mean.

In the transformation to percentiles, raw score differences near the centre of the distribution are exaggerated—small raw score differences may lead to large percentile differences. 

This is especially the case when a large proportion of examinees receive same or similar scores, causing a one- or two-point raw score difference to result in a 10- or 15-unit percentile difference.

Short tests with a limited number of possible raw scores often result in a clustering of scores. The resulting effect on tables of selected percentiles is “gaps” in the table corresponding to points in the distribution where scores cluster most closely together.


Raw score

The first quantitative result obtained in scoring a test. Examples include the number of right answers, the number right minus some fraction of the number wrong, the time required for performance, the number of errors, or similar direct, unconverted misinterpreted measures.



The extent to which a test is consistent in measuring whatever it is intended to measure; dependability, stability, trustworthiness, relative freedom from errors of measurement. Reliability is usually expressed by some form of reliability coefficient or by the standard error of measurement derived from it.


Standard deviation (SD)

A measure of the variability or dispersion of a distribution of scores. 

The more the scores cluster around the mean, the smaller the standard deviation. For a normal distribution, approximately two thirds (68.25%) of the scores are within the range from one SD below the mean to one SD above the mean. Computation of the SD is based upon the square of the deviation of each score from the mean.



The extent to which a test does the job for which it is used. This definition is more satisfactory than the traditional “extent to which a test measures what it is supposed to measure,” since the validity of a test is always specific to the purposes for which the test is used.

We hope that this helps you to understand some of the technical terms that may get thrown around!


Anisa Zulfiqar

Find Anisa on LinkedIn

This article originally appeared on the TalentLens Australia / New Zeland website 

Critical Thinking