What Goes Into a Personality Test That Actually Works?

Conceptual image used for introversion or personality content
Share
Link copied!

Creating a personality test means building something that captures how a person thinks, feels, and processes the world, then translating those patterns into something meaningful and actionable. A well-constructed test combines psychological theory, carefully written questions, and a scoring framework that reveals genuine behavioral tendencies rather than surface-level preferences.

Most people take personality tests without ever wondering what went into making them. Having spent decades in advertising trying to understand what drives human behavior, I’ve become fascinated by that question. The craft behind a good test is far more nuanced than it looks from the outside.

Person sitting at a desk surrounded by notebooks and frameworks, thoughtfully designing a personality assessment

Our MBTI General and Personality Theory hub covers the broader landscape of personality frameworks and how they apply to real life. This article goes a layer deeper, examining what it actually takes to build a test that measures something real, and why that process matters for anyone who wants to understand themselves more clearly.

Why Does the Foundation of a Personality Test Matter So Much?

Every test begins with a theory. Before a single question gets written, the creator has to decide what psychological construct they’re trying to measure. That choice shapes everything that follows.

The Myers-Briggs Type Indicator, for example, was built on Carl Jung’s theory of psychological types. Isabel Briggs Myers and her mother Katharine Cook Briggs spent years translating Jung’s abstract concepts into observable behavioral preferences. They weren’t just writing questions. They were operationalizing a theory, turning philosophical ideas about how minds work into something measurable.

Compare that to the Big Five model, which emerged from a different starting point entirely. Researchers there used a lexical approach, cataloging all the personality-describing words in the English language and then using statistical analysis to find the underlying dimensions. Two completely different methodologies, two different frameworks, both claiming to capture personality.

A 2020 study published in PLOS ONE via PubMed Central found that personality measurement reliability depends heavily on how well the theoretical constructs are defined before any questions are written. Vague constructs produce inconsistent results, no matter how polished the test looks on the surface.

Early in my agency career, I made a version of this mistake. We were profiling consumer segments for a major retail client and we started writing survey questions before we’d agreed on what we were actually trying to understand. We ended up with data that told us a hundred different things and nothing useful. My research director at the time pulled me aside and said, “You can’t measure something you haven’t defined.” That conversation changed how I approach every diagnostic framework since.

What Dimensions Should a Personality Test Actually Measure?

Once you have a theoretical foundation, the next decision is which dimensions to include. This is where personality test design gets genuinely interesting, and where many amateur attempts fall apart.

Strong personality dimensions share a few characteristics. They’re bipolar, meaning they represent a spectrum between two meaningful poles rather than the presence or absence of a trait. They’re independent, meaning scoring high on one dimension doesn’t automatically predict your score on another. And they’re comprehensive enough to capture real behavioral differences without overlapping each other.

The extraversion-introversion dimension is one of the most studied in all of personality psychology. It appears across virtually every major framework because it reflects something genuinely fundamental about how people gain and spend energy, how they process information, and how they engage with the world. If you want to understand this dimension thoroughly before building anything around it, E vs I in Myers-Briggs: Extraversion vs Introversion Explained walks through the nuances in real depth.

Beyond extraversion and introversion, MBTI-based frameworks measure three additional dimensions: sensing versus intuition, thinking versus feeling, and judging versus perceiving. Each of these captures a genuinely different aspect of how people take in information and make decisions.

What makes modern personality theory richer, though, is the layer beneath these surface dimensions. Cognitive functions, the mental processes that drive each type, add texture that simple preference scores can miss. A test that only measures preferences might tell you someone is an introvert who prefers thinking. A test that probes cognitive functions can distinguish between someone whose thinking is primarily analytical and systematic, what researchers call Introverted Thinking (Ti), versus someone whose thinking is driven by external standards and measurable outcomes, which is the territory of Extroverted Thinking (Te). That’s a meaningful difference that surface-level preference scales often blur.

Abstract visual showing a spectrum between two poles representing personality dimensions in a test framework

How Do You Write Questions That Actually Reveal Personality?

Question writing is where theory meets craft. A poorly written question can undermine even the most solid theoretical framework. A well-written one can reveal things about a person they hadn’t consciously articulated before.

There are several question formats commonly used in personality assessment. Forced-choice questions ask respondents to pick between two statements that both sound reasonable, which reduces social desirability bias. Likert scale questions ask how strongly someone agrees or disagrees with a statement, which captures degree rather than just direction. Situational questions describe a scenario and ask what someone would do, which can reveal behavioral tendencies that abstract self-descriptions miss.

The challenge with all of these is that people answer based on their self-concept, not always their actual behavior. We tend to describe who we aspire to be as much as who we are. A 2005 piece from the American Psychological Association’s Monitor on Psychology examined how self-perception gaps affect psychological assessment, noting that people consistently rate themselves more favorably on socially desirable traits. Good test design accounts for this by framing questions in ways that make both options feel equally acceptable.

One technique I’ve always respected is the ipsative format, where instead of rating how much you agree with a statement, you’re forced to choose between two equally attractive options. It creates a kind of cognitive friction that bypasses the “best version of myself” filter. Running agency new business pitches, I used a version of this logic when building team assessments. Asking people to choose between “I prefer to work through problems alone first” and “I prefer to think out loud with the group” generated far more honest answers than asking them to rate how collaborative they were on a scale of one to five.

Question length and vocabulary also matter enormously. Questions should be short enough to process quickly, unambiguous enough that different readers interpret them the same way, and free of double-barreled phrasing (asking two things in one question). Every unnecessary word is an opportunity for misinterpretation.

What Makes a Scoring System Reliable and Valid?

Writing good questions is only half the work. How you score those questions determines whether the test actually measures what it claims to measure.

Reliability refers to consistency. A reliable test produces similar results when taken under similar conditions, and similar results when the same person takes it twice within a short window. Validity refers to accuracy. A valid test actually measures the construct it’s designed to measure, not something adjacent or superficially related.

These two qualities are related but distinct. A test can be highly reliable, giving you the same answer every time, while measuring the wrong thing entirely. And a test that’s theoretically measuring the right thing can still produce inconsistent results if the questions are poorly calibrated.

Professional personality assessments go through extensive validation processes. Item analysis examines how each individual question performs, whether it correlates with the dimension it’s supposed to measure and doesn’t correlate too strongly with dimensions it shouldn’t. Factor analysis groups questions statistically to confirm they’re actually measuring the same underlying construct. Test-retest reliability studies check whether scores remain stable over time.

A 2008 study in PubMed Central examining personality measurement methodology found that construct validity, the degree to which a test measures the theoretical construct it claims to, is often the most difficult form of validity to establish, precisely because personality constructs are abstract by nature.

One reason mistyping happens so frequently in informal assessments is that many popular online tests skip rigorous validation entirely. They look like personality tests, they feel like personality tests, but they haven’t been put through the process that would confirm they’re measuring what they claim. If you’ve ever suspected your results don’t quite fit, Mistyped MBTI: How Cognitive Functions Reveal Your True Type is worth reading before you take another test.

Close-up of a scoring rubric and statistical analysis chart used in personality test validation research

How Do Cognitive Functions Change the Design Equation?

Most basic personality tests measure preferences. Cognitive function-based tests try to measure something more fundamental: the mental processes that drive those preferences in the first place.

This is a more ambitious design challenge. Preferences are relatively easy to assess through self-report. You can ask someone whether they prefer structured plans or spontaneous decisions and get a reasonably honest answer. Cognitive functions are harder because they describe how a person’s mind actually operates, patterns that the person themselves may not be consciously aware of.

Designing questions that probe cognitive functions requires going beyond “what do you prefer” and into “how do you actually process this.” A question targeting Extraverted Sensing, for instance, might explore whether someone is highly attuned to sensory details in their immediate environment, or whether they trust and act on physical instinct in real time. You can read more about what that actually looks like in practice in Extraverted Sensing (Se) Explained: Complete Guide.

The design challenge is that cognitive functions exist in a hierarchy. Every type has a dominant function, an auxiliary, a tertiary, and an inferior. A well-designed cognitive function test needs to assess not just which functions someone uses, but how prominently each one figures in their mental stack. That requires more questions, more nuanced scoring, and more sophisticated interpretation than a simple preference scale.

Our Cognitive Functions Test was built with exactly this layered approach in mind, probing the full mental stack rather than just surface preferences. If you want to see what a functions-based assessment actually feels like from the inside, that’s a good place to start.

From a design standpoint, incorporating cognitive functions means accepting more complexity in exchange for more accuracy. Not every use case requires that depth. A quick team-building exercise might be well served by a simpler preference-based format. A serious attempt at self-understanding probably warrants the fuller picture.

What Does the Result Interpretation Layer Need to Do?

A test is only as useful as what it tells you at the end. Result interpretation is a design challenge in its own right, and one that many test creators underinvest in.

Good result interpretation does several things simultaneously. It accurately describes the type or profile the scores point to. It contextualizes that description so the person understands what it means in practical terms. It acknowledges nuance, including the fact that most people don’t sit neatly at the extreme ends of any dimension. And it points toward growth without making the person feel like their type is a limitation.

That last point matters more than it might seem. Personality descriptions have a tendency to become self-fulfilling. Tell someone they’re “not naturally collaborative” and they may stop trying to develop that skill. Frame the same characteristic as “someone who does their best collaborative thinking after processing independently first” and you’ve opened a door rather than closed one.

I felt this acutely when I first got serious about understanding my INTJ type. Early descriptions I read made it sound like I was destined to be cold, isolated, and perpetually misunderstood. That framing was genuinely demoralizing. It took finding descriptions that honored the strengths while honestly addressing the growth edges before the whole framework became useful to me. The difference between those two kinds of descriptions is entirely a writing and design choice.

Data from 16Personalities’ global profile research suggests that how people respond to their results varies significantly by type, with some types being far more likely to find their descriptions immediately resonant and others requiring more nuance before the results feel accurate. That variability is a design signal. It tells you that one-size-fits-all result language will serve some users well and others poorly.

How Does Personality Testing Apply in Real Organizational Settings?

Running advertising agencies for two decades meant I was constantly in the business of understanding people, clients, creative teams, account managers, strategists. Personality frameworks weren’t abstract theory for me. They were practical tools for figuring out how to get the best work out of everyone in the room.

The tests that actually helped were the ones that gave people language for differences they’d already felt but couldn’t articulate. A senior creative director on one of my teams was brilliant but consistently resistant to client feedback sessions. For years, people assumed he was arrogant. When we did a team assessment and he came out with a strong introverted processing preference and a high Ti score, something shifted. He wasn’t dismissing the clients. He was genuinely unable to evaluate feedback in real time without feeling like he was compromising his standards. Once the team understood that, we restructured how feedback sessions worked for him. His output improved, and so did the team’s relationship with the client.

That’s what a well-designed test can do in an organizational context. It creates shared vocabulary for differences that would otherwise generate friction. Research from 16Personalities on team collaboration supports this, finding that teams with shared understanding of personality differences report meaningfully higher collaboration quality than those without it.

The flip side is that poorly designed tests in organizational settings can do real damage. A test that misclassifies someone, or one that produces results so generic they could apply to anyone, creates false confidence. Leaders make decisions about roles and responsibilities based on those results, and if the results are wrong, those decisions are built on sand.

One of the more uncomfortable moments in my agency career came when I realized we’d been using a vendor’s personality assessment in our hiring process for almost two years before anyone questioned its validity. When we finally dug into the methodology, it was essentially a repackaged self-help questionnaire with no real validation research behind it. We’d been making consequential hiring decisions based on something that had never been properly tested. That experience made me far more rigorous about evaluating any assessment tool before deploying it.

Diverse team in a workplace setting reviewing personality assessment results together at a conference table

What Separates a Useful Personality Test From a Forgettable One?

Plenty of personality tests exist. Most people who’ve taken a few can tell you which ones felt meaningful and which ones felt like a novelty. What’s the actual difference?

From everything I’ve observed, both as a consumer of these tools and as someone who’s used them professionally, a few qualities consistently separate the useful from the forgettable.

Theoretical grounding matters. Tests built on established psychological frameworks, ones that have been debated, refined, and tested over decades, tend to produce results that hold up over time. Tests built on intuition or trend-chasing tend to feel accurate in the moment but hollow on reflection.

Honest nuance matters. A test that tells you you’re a specific type while acknowledging that types exist on spectrums, and that context shapes expression, is more useful than one that delivers a crisp label and stops there. The science of deep thinking suggests that self-aware individuals often find overly simplified descriptions frustrating precisely because they can feel their own complexity.

Actionability matters. A result that describes who you are without pointing toward what you might do with that information has limited practical value. The best personality frameworks give you a map and a compass, not just a photograph.

And perhaps most importantly, a good personality test invites ongoing reflection rather than closing down the conversation. My own understanding of my INTJ type has evolved considerably over the years. What felt like a definitive answer when I first got typed has become a starting point for much deeper self-examination. A test that positions its results as the final word is doing its users a disservice.

If you haven’t yet established a clear sense of your own type, that’s the natural place to begin. Our free MBTI personality test gives you a solid foundation to work from, and the results come with enough context to make them genuinely useful rather than just a label.

The science of personality assessment is imperfect, and anyone honest about the field will tell you so. A 2005 piece from the APA’s Monitor on Psychology noted that even well-validated instruments capture only a portion of the complexity that makes each person who they are. That’s not a reason to dismiss these tools. It’s a reason to hold their results with appropriate humility, using them as lenses rather than verdicts.

What I find genuinely moving about well-designed personality tests is what they do for people who’ve spent years feeling like they’re wired wrong. Quiet people who’ve been told they need to speak up more. Analytical thinkers who’ve been called cold. Deep processors who’ve been labeled slow. A good test doesn’t just describe these people. It reframes their experience in a way that restores their sense of self-worth. That’s not a small thing. In my experience, it can be profound.

Thoughtful person reading personality test results at home, looking reflective and engaged with self-discovery

Find more articles on personality frameworks, cognitive functions, and the science of self-understanding in our complete MBTI General and Personality Theory Hub.

About the Author

Keith Lacy is an introvert who’s learned to embrace his true self later in life. After 20 years in advertising and marketing leadership, including running agencies and managing Fortune 500 accounts, Keith now channels his experience into helping fellow introverts understand their strengths and build fulfilling careers. As an INTJ, he brings analytical depth and authentic perspective to every article, drawing from both professional expertise and personal growth.

Frequently Asked Questions

What is the first step in creating a personality test?

The first step is defining the psychological construct you want to measure. Before writing a single question, you need a clear theoretical framework that explains what dimensions of personality you’re capturing and why those dimensions matter. Without this foundation, questions become arbitrary and results become unreliable. Most serious personality assessments are built on established frameworks like Jungian typology or the Big Five model precisely because those frameworks have been rigorously defined and debated over decades.

How many questions does a personality test need to be accurate?

There’s no universal answer, but more questions generally produce more reliable results, up to a point. Most validated personality assessments use between 60 and 100 questions to ensure each dimension is measured across multiple items, which reduces the impact of any single poorly answered question. Very short tests (under 20 questions) can still provide useful directional information, but they sacrifice reliability and nuance. The tradeoff is always between accuracy and the time a respondent is willing to invest.

What is the difference between reliability and validity in a personality test?

Reliability means the test produces consistent results under similar conditions. A reliable test gives you roughly the same answer whether you take it today or two weeks from now. Validity means the test actually measures what it claims to measure, not something related but different. A test can be reliable without being valid (consistently measuring the wrong thing), but a valid test must also be reliable. Both qualities are essential for a personality assessment to be genuinely useful.

Can personality tests be used in workplace hiring decisions?

Personality tests can provide useful context in hiring, but they carry real risks when used as primary decision-making tools. Tests that haven’t been validated for occupational prediction can introduce bias and produce misleading results. Most organizational psychologists recommend using personality assessments as one input among many, not as a filter or gate. The most valuable application in workplace settings is building shared vocabulary among existing teams, helping people understand each other’s working styles rather than screening candidates in or out.

How do cognitive functions differ from basic personality preferences in test design?

Basic preference-based tests ask what you prefer, while cognitive function-based tests probe how your mind actually processes information. Preferences are relatively easy to self-report. Cognitive functions are deeper patterns that may not be consciously accessible, which makes them harder to measure but more revealing when captured accurately. A preference test might tell you that you favor thinking over feeling. A cognitive functions test can distinguish whether that thinking is primarily inward and analytical (Introverted Thinking) or outward and systems-oriented (Extroverted Thinking), a distinction that matters significantly for understanding how someone actually operates.

You Might Also Enjoy