Posted 19 Aug 2025
Online Neurodiversity Testing and Assessments: A Health Warning
Author: Dr. Nancy Doyle, PhD, CPsychol
Founder & Chief Science Officer, Genius Within CIC
Co-Director, Centre for Neurodiversity at Work, Birkbeck, University of London
In an era where every man and his dog think they can get rich quick with online neurodiversity testing and assessments, quality counts and most people don’t know what they don’t know. Buyer Beware!
Neurodiversity testing and assessment design is a science. It’s called “Psychometrics” where we use statistical analysis to check whether the test measures what we want it to measure, and whether it does so reliably. There are internationally agreed standards for this process. It’s not something that you can undertake easily, even with professional expertise. Highly qualified, highly ethical people who have overlooked the need for reliability and validity testing might still make mistakes.
So let me enlighten you with some basics, to help you sort the good products from the snake oil and bust some of the myths I’ve heard in circulation.
Lesson One – the normal distribution
The normal distribution is a weird phenomenon of nature, where any sequential, numerical range you can think of (height, weight, anxiety levels, cognitive ability) will add up to a distribution of scores where 68% of people score more or less average. This is known as the standard deviation, and it’s the average distance from the average score. In a normal distribution, 95% of people will score within two standard deviations and 99.7% of people within three.
If your test does not have a normal distribution of scores, then you usually have one of two problems:
- You haven’t collected scores from enough people to have a representative sample.
And if your sample isn’t representative, you shouldn’t make decisions about the value of the test.
- The test doesn’t capture the full range of responses.
If you have enough people, but they are all scoring the same way (for example if 90% of people score within one standard deviation rather than 68%) it usually means some of the answers in your test are pointless and you aren’t measuring something meaningful.
Here are four graphs of score distribution which should alarm you, and I will explain why:
Skew
Positive Skew:

Negative Skew:

Skew, in an ability test, might mean the test is too easy, or too hard for most people. It won’t be sensitive enough to capture the full range of human experience.
In a self-rating test it might mean that people feel uncomfortable answering high or low, which might be because of social desirability – they don’t want to admit to negative feelings, or they want to align to a socially correct answer.
Leptokurtic

What is the point here of the upper and lower scores? What will a score here tell us other than we can answer a question just like everyone else?
It might mean that people have gotten bored with the test and are just hitting the middle score each time.
It’s like you’ve asked a question with only five possible answers, but if you provide this result on a scale of twenty five, you are misleading people that they are at a level of something, even through there is no level.
Platykurtic

With too many extreme scores, we can’t judge who is doing well and not well, it might mean that the scores are dependent on chance rather than measuring an underlying construct.
Here’s an example of the distribution of scores from the Genius Finder™ online neurodiversity testing and assessment tool, which is normally distributed. This is from a sample of about 2000 people.

I’ve heard it said that neurodivergent people don’t score in the normal range, and so statistics like this don’t apply them. Not true. In a sample of neurodivergent people, their scores will still be normally distributed between them, even though this might vary from the range of neurotypical scores.
Here’s a comparison of neurodivergent and neurotypical scores in the Genius Finder™ neurodiversity testing and assessment tool to show you what I mean.

This graph shows that both data sets are normally distributed. The neurodivergent sample sits more in the middle, because the Genius Finder™ was designed for us, but the neurotypical sample is still the right shape, even if it tends towards the higher scores.
The normal distribution is not biased, or rigged. It’s just a plain, boring fact, like Pythagoras’ Theorum or 2+2=4. Yes, it has been used to exclude or to diminish, for example when measuring types of cognitive ability, but that’s human interpretation, not the maths itself. If we privilege one type of cognitive ability over other forms of human value, that’s our own self-sabotage and limited creativity!
Lesson two: reliability
There are lots of ways to measure reliability, for example over time, but this depends on the test and what it is for. If you want scores to progress, you wouldn’t choose to check back in with the same people to see if the results are the same.
An absolute basic is the internal consistency score.
So if you have a test like the Genius Finder™, which measures confidence in 13 core workplace skills, you would expect people to score more or less the same for each skill.
For example, I am pretty sure of my skills in numeracy, so in the Genius Finder™ neurodiversity testing and assessment tool, I typically score 4 or 5 out of 5 for each question in that category. Someone with maths anxiety might score 1 or 2. Someone in the middle might score 2,3,4.
However, if my scores for numeracy include some 1’s and 2’s, as well as 4’s and 5’s, it shows inconsistency. That might be fine for the odd person, but if most of the people taking the test are inconsistent in the same style, it shows that the test is not measuring one thing. It might be that question one and two is tapping into a different type of numeracy than questions three or four. Which means that ‘numeracy’ isn’t a good description of what we are measuring, and we are misleading people. If the scores are random, it shows that people aren’t concentrating and the test isn’t being taken seriously. Both are a problem, if we want to make decisions based on the results.
Reliability is measured numerically, we do a correlation analysis to see the strength of the relationship between the questions in each category. If the correlation is .6 or above, this is okay, .7 to .95 is strong, and above .95 is too strong! Too strong is bad because it means the questions are too similar, and we aren’t capturing different aspects of the same construct. So for example we might just be asking about addition, rather than addition, percentages, speed and size which are all aspects of numeracy.
Lesson three: validity
Again, there are lots of ways to measure validity. You might want to do an assessment to see if your test correlates well with other tests of the same style. For example, an online ability test should correlate well with a well-established IQ test like the Weschler Adult Intelligence Scale. For example, we use the Genius Finder™ to help neurodivergent people identify which areas of their work they find challenging so that we can recommend solutions. So we expect it to correlate well with neurodivergent diagnoses, even though it doesn’t diagnose. Here’s a graph showing the relationships.

As you can see, they are all where you expect – dyscalculic people score lower in numeracy related tasks, dyspraxic people score lower in motion and balance. Neurodivergent people as whole score higher in creativity, because of course we do. 😊
The fundamental measure of validity is the factor analysis. In this test, you ask a computer program to review the correlations between questions for a large sample, and then based on which questions score similarly, it will tell you what the categories are. You are seeing if it can find all the numeracy questions and group them together as one, or all the memory questions. If it can do that, you have good construct validity, which means the test is measuring what you intend it to measure.
Neurodiversity testing and assessments final lesson: be an informed consumer
Beware of shiny salespeople telling you that you can get diagnosed online, or that AI can diagnose you based on your social media habits! Tests that haven’t been through this kind of rigorous testing are no better than a horoscope.
They might seem like they apply to you, but they are typically too generic to be helpful. This is called the Barnum Effect.
Marketing which looks professional can also be distracting – this is called the Guru Effect. Studies have shown that if a brand uses a brain, or the word ‘neuro’ then people will rate it as more credible, even if it is nonsense.
Even psychologists have been found to be vulnerable to these effects, so don’t give yourself a hard time, but DO ask whether a test has been evaluated for standardisation, reliability and validity and if they have a technical manual with the results that you can check. Even better, if someone objective has reviewed the results!
I have just given you a whistlestop tour through some Bachelor’s Degree level social science statistics and I hope I made it straightforward enough to understand. I also really wish someone had explained it to me like this in 1997 when I was a Psychology Undergraduate.
We’ve designed the Genius Finder™ assessment quite purposefully, taking these things into account, but also the need to be practical. It is set up to ask questions that help pinpoint areas for development, so that you can improve your workplace skills and achieve your career potential. We’ve focused specifically on what we know from coaching can be improved with the right changes to environment, actions and self-belief.
Do ask a question if you need any help navigating the confusing bombardment of testing options that are flooding our community.
By Dr Nancy Doyle
Suggested further reading
Dr Nancy Doyle, Genius Within Founder and Chief Science Officer, discusses psychometric assessments and shares a guide published by the Society of Occupational Medicine.
K R -H Teoh , A McDowall , N Doyle , R Kwiatkowski , R Kurtz , G Kinman