Skip to main content

Are Tests Reliable?


Argument in Defence of Four Letter Tests

This argument refers to high quality tests. Obviously there will be high and low quality tests and there is no reason to expect accuracy from a low quality test. How one evaluates the quality of a test is another issue entirely but this argument will assume that the test in question is carefully designed by an expert in the field and consists of comprehensible and unambiguous items.

The most obvious advantage of a test over self-analysis is that it relies on expert knowledge and presents that knowledge in an easily understandable form so that the subject can quickly receive an assessment without needing to become an expert themselves (or pay for someone else's expertise). The alternative is to invest a great deal of time into studying a system, based on the assumption that taking a quick test cannot possibly yield accurate results. Before investing this time, newcomers to the Four Letter would be justified in wanting to see empirical evidence that supports the claim that tests are less reliable than self-analysis. It is not clear that such evidence exists.

Another significant advantage is that tests are effectively calculators, sometimes computing and balancing extremely complicated calculations in order to provide the subject with a highly specific numerical breakdown of their assessment. Let's say that we are dealing with 8 factors (in a hypothetical system that resembles the Four Letter) and each factor has several diagnostic criteria. Some of these criteria may occur in more than one of the 8 factors (e.g. "sympathy" may be an indicative trait for 3 out of 8 factors). Some of the criteria may be weighted differently (e.g. computing the weighting of dom vs aux functions when calculating the overall probability of an Four Letter type). Some criteria will be mapped inversely (e.g. being "detail oriented" may increase the probability of one factor and decrease the probability of another factor). Each criteria can be scored in a nuanced manner (e.g. Agree strongly = +2, Slightly disagree = -1, Neither agree nor disagree = 0). The probability that the subject's #1 factor is "factor A" and not any of the other 7 factors can only be accurately determined by summing the subject's scores for every single diagnostic criteria and ranking each factor by their total. What tests and self analysis share in common is that the overall probability of a given result can only be accurately worked out by balancing all of these subtle, complicated and interactive calculations. The difference is that the numerical complexity, specificity and proportional clarity of a test is unlikely to be matched by self analysis. While the accuracy of answering test questions may be in doubt, the far greater resolution of test-based calculations is self-evident.

Critics of tests may claim that test results will vary depending on the moment that the subject takes the test, but this criticism is only valid for people who take a test once and once only. For a subject who takes a test 2 or 3 times in intervals of weeks or months and then calculates the average results, this critique is invalid. This criticism also applies to self analysis. Four Letter users adjusting and doubting their self-assessments on a day to day basis is a common sight on PDB's forums (e.g. "recently I've been thinking that I don't actually use Te all that much" etc.).

Concerns that test results are biased because the test taker is biased are also invalid. Bias can still exist in the mind of an expert who analyses herself. It can also exist in the mind of a practitioner who analyses a client. The client too may be offering biased and inaccurate answers to the practitioner. This critique indicates a problem with all human conducted analysis and offers no reason to assume that tests are any less reliable than the alternatives.

The validity of criticising a test's lack of nuance will vary depending on the quality of the tests. Low quality tests may ask very general questions that draw conclusions from answers that could have multiple different underlying reasons. High quality tests will be much more targeted in their questions and calculations and will avoid scoring subjects on the basis of multifaceted issues.

Finally it must also be acknowledged that the dominant form of personality analysis amongst scientists is the big 5, which comes with the assumption (and statistical evidence) that tests are reliable. If personality tests have been deemed by the scientific community to be reliable, what is the basis for the claim that Four Letter tests are not reliable? Either there is something distinct to the Four Letter that cannot be captured in a test, or critics of personality tests are making the claim that Four Letter self analysis is more accurate than the scientific community's most widely utilised form of personality analysis - a self-evidently dubious claim.

In summary: It is not obvious that the popular assumption that Four Letter tests are any less reliable than other methods of applying the Four Letter type, has any concrete basis in reality. One hardly needs to cite the once popular belief in a flat-earth in order to indicate the pitfalls of holding strong convictions with no scientific basis. Critics of Four Letter tests need to centre their arguments around 2 points:
1. A clarification of the specific reason why tests can be assumed to be reliable in regard to the big 5 but not in regard to the Four Letter system.
2. An argument that specifically illustrates why some proposed alternative is superior to taking the test. The argument needs to clarify why a person who simply wishes to know their Four Letter type should invest much time and effort into learning a system (or invest money in paying someone who knows the system) rather than simply taking a test. Merely criticising tests in isolation is insufficient. Critiques that apply to both Four Letter tests and their alternatives do nothing to strengthen the overall argument.

[This argument needs to be balanced by an argument in favour of the alternatives to tests (e.g. self-analysis) with appropriate responses to the points made in this argument]

Argument in Defence of the Alternatives to Four Letter Tests

[This argument really needs to be written by someone who subscribes to this position]


16personalities does not test for the Four Letter type. It is actually testing for Big 5 constructs. By using the four-letter dichotomy, and being the first result for an Four Letter test, it is misleading. They say their test is “valid” but nowhere on the site do they say they are affiliated with anything MBTI, and tucked away on the site, they outright say they don’t use any Jungian concepts, they just use the letters. That is why the their types have -A or -T, depending on the level of Neuroticism one has, because they had no letter to correlate that dimension to.

“We use the acronym format introduced by Myers-Briggs for its simplicity and convenience, with an extra letter to accommodate five rather than four scales. However, unlike Myers-Briggs or other theories based on the Jungian model, we have not incorporated Jungian concepts such as cognitive functions, or their prioritization. Jungian concepts are very difficult to measure and validate scientifically, so we’ve instead chosen to rework and rebalance the dimensions of personality called the Big Five personality traits, a model that dominates modern psychological and social research.”

Now that’s not to say there aren’t correlations to Four Letter types, but correlation does not equal causation and there absolutely are exceptions to the norm. They are also founded on completely different systems, so what one system says does not necessarily carry over to the other. Four Letter system is based out of Jungian cognitive functions. Big 5 is based out of the personality traits Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness to Experience. They  are not the same. If the only thing separating INFJ (Ni(Fe-Ti)Se) from INFP (Fi(Ne-Si)Te) is level of Conscientiousness, there's a problem, since INFJ and INFP have completely different approaches to cognizing the world. It is not internally valid.

Sakirnova / Keys2Cognition

[talk about the issue with ordering your use of cognitive functions (i.e. "Ne > Se > Ti > Fe > Te"... etc) and how it can lead to a misunderstanding of the concepts]

Let's say, for example, if you score high on Ne, it's likely you will also score high on Se. That does not mean you use both. The reason being is that Ne and Se are both extraverted perceivers (Pe), so they are going to serve the same purpose, just in different ways. What is likely to appeal to an Ne dominant will also likely appeal to an Se dominant type. That’s because these tests are testing for emergent properties. e.g. “do thrills make you feel alive?” is not an Se exclusive thing, but it’s being calculated as an Se exclusive thing. The limitations of calculation don't account for certain emergent properties being attributable to more than one function in different ways, which is a part of the inherent limitation and risk of getting your type from a test.


Ways to group types

One thing that should be brought up in the misinterpretations article is the Kiersey temperaments. Kiersey rejected the cognitive functions and was a behaviorist, which has no place in the psychodynamic Jungian school of thought. Yet his groupings of the types remains the most popular way to group them due to the popularity of 16personalities.
He groups them by the following: SJ, SP, NT, NF


This is incompatible with modern Jungian typology, since it haphazardly distributes the types. The issue with this is by grouping NTs and NFs, it is mixing both function axes (both Je/Ji and Pe/Pi), while SJs and SPs are only mixing one: Je/Ji. If this were consistent with SP and SJ it would be NJs, SJs, SPs, and NPs. If it were consistent with NT and NF it would be NTs, NFs, STs, SFs.

Instead, here are other ways to group types that are internally consistent and are based upon the functions:

By the first perceiving function in the stack

NJs (conscious Ni users): INTJ, INFJ, ENTJ, ENFJ
SJs (conscious Si users): ISTJ, ISFJ, ESTJ, ESFJ
SPs (conscious Se users): ESFP, ESTP, ISFP, ISTP
NPs (conscious Ne users): ENFP, ENTP, INFP, INTP

By the first judging function in the stack and the attitude of the first perceiving function


By the dominant/inferior axis role

IJs (Pi dominant, Pe inferior): INTJ, INFJ, ISTJ, ISFJ
EPs (Pe dominant, Pi inferior): ESFP, ESTP, ENFP, ENTP
IPs (Ji dominant, Je inferior): ISFP, ISTP, INFP, INTP
EJs (Je dominant, Ji inferior): ENTJ, ENFJ, ESTJ, ESFJ

In addition to this, we may make a further split between these groups; IJs and EPs are irrational types (primarily apprehends the world through the lens of perceptions), while IPs and EJs are rational types (primarily apprehends the world through the lens of judgements).

By type families


[naming conventions for these? i don't want to insinuate anything by giving them nicknames, i just like group A, B, C ,D]

Functions as tools

If you took all eight functions and had them apprehend the same thing, they would all process the same thing through a different lens. In Motes & Beams (pg. 21), Michael Pierce illustrates this with an example of how all eight functions would process the same event, and through what lens each would view the incident.

Each of the four basic functions appears as modified by an attitude of extraversion or introversion. The eight resulting functions of personality are as follows:

Extraverted Sensation — perceiving what is objectively denoted; e.g. “Colonel Mustard did in fact murder Mr. Body with a candlestick.”
Introverted Sensation — perceiving what is subjectively denoted; e.g. “This is how I witnessed Mr. Body’s murder.”
Extraverted Intuition — perceiving what is objectively connoted; e.g. “There are a number of possible reasons for Colonel Mustard to murder Mr. Body.”
Introverted Intuition — perceiving what is subjectively connoted; “I perceive that this is the real reason why Colonel Mustard murdered Mr. Body.”
Extraverted Feeling — judgement based on what is objectively connoted; “Despite our differences, we can all agree that what Colonel Mustard did is evil.”
Introverted Feeling — judgement based on what is subjectively connoted; e.g. “I judge, of myself alone, that what Colonel Mustard did is evil.”
Extraverted Thinking — judgement based on what is objectively denoted; e.g. “We all see the evidence (bloody hands, scene of the crime, etc.) therefore, we must all conclude that Colonel Mustard is Mr. Body's murderer.”
Introverted Thinking — judgement based on what is subjectively denoted; e.g. “If you accept my definition of 'murderer', then you must conclude with me that Colonel Mustard is one.”

Written and maintained by PDB users for PDB users.