The 4 differences between reliability and validity (in science)

March 23, 2024

Since in colloquial language they have very similar meanings, it is easy to confuse the terms of reliability and validity when we speak of science and, specifically, of psychometrics.

With this text we intend to elucidate the major differences between reliability and validity . Hopefully you find it useful to clarify this usual doubt.

Related article: "Psychometrics: studying the human mind through data"

What is reliability?

In psychometry, the concept "reliability" refers to the accuracy of an instrument ; Specifically, the reliability coefficients inform us of the consistency and stability of the measures taken with this tool.

The greater the reliability of an instrument, the lower the amount of random and unpredictable errors that will appear when using it to measure certain attributes. Reliability excludes predictable errors, that is, those that are subject to experimental control.

According to the classical theory of tests, reliability is the proportion of the variance that is explained by the true scores. Thus, the direct score in a test would be made up of the sum of the random error and the true score.

The two main components of reliability they are temporary stability and internal consistency . The first concept indicates that the scores change little when measured on different occasions, while the internal consistency refers to the degree to which the items that make up the test measure the same psychological construct.

Therefore, a high reliability coefficient indicates that the scores on a test fluctuate little internally and as a function of time and, in summary, that the instrument is absent of measurement errors .

Maybe you're interested: "Types of psychological tests: their functions and characteristics"

Definition of validity

When we speak of validity, we refer to whether the test correctly measures the construct it intends to measure. This concept is defined as the relationship between the score obtained in a test and another related measure ; the degree of linear correlation between both elements determines the coefficient of validity.

Also, in scientific research a high validity indicates the degree to which the results obtained with a given instrument or in a study can be generalized.

There are different types of validity, which depend on the way in which it is calculated; this makes it a term with very different meanings. Basically we can distinguish between content validity, criterion (or empirical) validity, and construct validity .

Content validity defines to what extent the items of a psychometric test are a representative sample of the elements that make up the construct to be evaluated. The instrument must include all the fundamental aspects of the construct; For example, if we want to make an adequate test to measure depression, we must necessarily include items that assess mood and decrease pleasure.

Criterion validity measures the ability of the instrument to predict aspects related to the feature or area of interest. Finally, the construct validity is intended determine if the test measures what it intends to measure , for example from the convergence with the scores obtained in similar tests.

Differences between reliability and validity

Although these two psychometric properties are intimately related, the truth is that they refer to clearly differentiated aspects. Let's see what these differences are .

1. The object of analysis

Reliability is a characteristic of the instrument, in the sense that it measures the properties of the items that comprise it. On the other hand, the validity does not refer exactly to the instrument but to the generalizations that are made from the results obtained through it.

2. The information they provide

Although it is a somewhat simplistic way of putting it, in general terms it is often said that validity indicates that a psychometric tool actually measures the construct it intends to measure, while reliability refers to whether it measures it correctly, without errors.

3. The way in which they are calculated

Three procedures are fundamentally used to measure reliability: the method of the two halves, the one of parallel forms and the test-retest . The most used is the procedure of the two halves, in which the items are divided into two groups once the test is answered; then the correlation between the two halves is analyzed.

The method of parallel or alternative forms consists of creating two equivalent tests to measure to what extent they correlate the items between them. The test-retest is simply based on passing the test twice, under conditions as similar as possible.Both procedures can be combined, giving rise to the test-retest with parallel forms, which consists of leaving a time interval between the first form of the test and the second.

For its part, the validity it is calculated in different ways depending on the type , but in general all methods are based on the comparison between the score in the objective test and other data of the same subjects in relation to similar traits; the objective is that the test can act as a predictor of the trait.

Among the methods used to evaluate the validity we find the factorial analysis and the technique of multi-method-multi-trait matrices. Also, content validity is often determined by rational, non-statistical analyzes; for example, it includes the apparent validity, which refers to the subjective judgment of experts on the validity of the test.

4. The relationship between both concepts

The reliability of a psychometric instrument influences its validity: the more reliable it is, the greater its validity . Therefore, the coefficients of validity of a tool are always lower than those of reliability, and validity indirectly informs us about reliability.