How do you assess non-academic outcomes?
53% of teachers and 70% of young people believe that life skills such as confidence, motivation and resilience are more important than academic qualifications, according to Sutton Trust research, while only one in five pupils say the school curriculum helps them ‘a lot’ with developing such skills.
Employers frequently reference the skills gap: the Sainsbury Review noted the importance of transferrable workplace skills such as communication and problem solving. And there is substantive research that these ‘non-cognitive skills’ are particularly important for helping to close the attainment gap for pupils from lower-income backgrounds.
So why there is so little evidence about what growing and influencing these skills looks like in practice? There are several reasons.
- Less tangible outcomes like resilience and motivation are always going to be more difficult to assess than results on a standardised maths test.
- There are genuine limitations to the degree to which such skills can be meaningfully transferred or taught in isolation, as the work of cognitive scientists like Daniel Willingham shows us.
- There is a real lack of rigorous thinking about definition, measurement and evidence of non-cognitive skills.
This article aims to provide some practical solutions, based on research evidence, about ways to meaningfully assess these outcomes so that schools can:
- get a better sense of if their efforts in developing these skills are making a difference
- assess the impact of interventions
- make more informed decisions using data beyond the academic.
Thinking about types of practice
In most schools, assessment of non-cognitive outcomes such as motivation, resilience or confidence will be observational: do pupils appear more engaged or more persistent? Some schools may ask about such skills as part of pupil voice activity. But in most cases that is likely to be the extent of it.
In the social sciences, and in particular psychology, there is a long-established strand of research that looks at meaningful ways to assess such outcomes. Typically this would involve the use of ‘validated’ measures i.e. those that have been through a robust academic testing process to ensure they provide accurate and reliable results.
Without specialist academic expertise, it is actually very hard to design an assessment measure that returns consistent data
For example, one commonly used measure known as the Big Five Inventory has been found to be a reliable predictor of 18-year olds earnings twenty years later in their lives.
The advantage of using such validated instruments over self-designed questionnaires or teacher observation can boiled down to three points.
- Reliability. Without specialist academic expertise, it is actually very hard to design an assessment measure that returns consistent data over time and is unlikely to produce biased responses.
- Validity. Most externally tested instruments will have been involved in major studies that tested that results of measures actually relate to outcomes that matter – for instance, conscientiousness affecting lifetime earnings.
- Viability. Most teachers are unlikely to be expert assessors of these skills, so using externally tested measures is likely to make it more possible and less work to assess non-cognitive skills.
In using validated instruments for assessing non-cognitive skills, there are three main types we will consider here.
- Pupil self-report
- Teacher report
- Observable data
1. Pupil self-report
Pupil self-report questionnaires are the most widely used form in the research literature for assessing non-cognitive skills. They are fairly easy to implement, don’t generate extra work for teachers – and they’re amongst the most reliable measures.
The Education Endowment Foundation’s SPECTRUM database gives a sense of the huge variety of such measures. Unfortunately, in many cases appropriate measures will be locked away behind academic paywalls or available only on a commercial basis from assessment providers.
However, there are a number of free alternatives that have been well tested and cover outcomes of interest to teachers. Particularly credible and accessible measures include:
- the Motivated Strategies for Learning Questionnaire (MSLQ), which covers outcomes such as metacognition, motivation and test anxiety
- the Self-efficacy Questionnaire for Children (SEQ-C), which assesses pupils’ sense of their own ability to achieve particular outcomes
- the Short Grit Scale (Grit-S), which looks at how pupils can persist and keep on going even on challenging tasks.
(Most of these scales can be found online and used for free, with some careful searching. If you have difficulty finding what you're looking for, please get in touch.)
These measures can either be completed on paper or (preferably) put into an online surveying tool to gather and analyse pupil responses. Many tools will also provide benchmarking data that enables you to assess how results in your schools compare against similar age pupils nationally.
Assessing non-academic outcomes across a MAT
One of our ImpactEd partner multi-academy trusts used grit and self-efficacy measures across eight of their primary schools at the beginning and end of a term to evaluate the impact of enrichment provision. These results are now being used to understand how difference in school implementation influenced impact, and what the most successful approaches are.
Use of these measures should be taken with care to ensure comparability in conditions. Self-report questionnaires will be prone to ‘social desirability bias’, where pupils answer in a way that they feel they are expected to. Aim to run these questionnaires in moderated conditions and make clear that pupils will not be judged on their individual responses. They should simply respond with how they feel.
2. Teacher report
Where pupil self-reports are not available or appropriate (for instance, with very young children), teacher report measures are also a viable option. The Strengths and Difficulties Questionnaire, for instance, is a widely used measure within schools, particularly for inclusion and well-being focused work.
Because these measures are based on teacher judgements, social desirability bias will be less of an issue, but you should take similar approaches to other good assessment practice to ensure consistency. The following are particularly important.
- Before widely using a measure, get a group of teachers to independently rate specific pupils and then compare results. Allow for discussion of measures that may not be interpreted similarly by all teachers, and agree a consistent approach.
- Separate these judgements from accountability or performance management. If teachers are going to be evaluated by their individual judgements, you are creating a big disincentive to report as accurately as possible.
3. Observable data
In addition to direct questionnaire measures, you may also wish to consider indirect measures, using data that you probably already collect, such as behaviour markers, attendance or reward systems. Academics would call these ‘observables’ – data that is already collected as part of a school’s day-to-day life, and relates to observed data rather than direct pupil responses.
Using observable data has two main benefits.
- Reliability, particularly when using straightforward metrics like attendance, is likely to be significantly higher than questionnaire measures.
- You are likely to be collecting this data anyway, so there is no need for new data collection.
Typically you would look at baseline performance on the metrics you are reviewing and see how this changes over time for your target cohort of pupils (as outlined in my end-to-end guide to impact evaluation).
If you are also collecting teacher or pupil reported data, you can then correlate the data. For example, you could see if an improvement in pupils’ self-efficacy scores was associated with improved attendance at school.
Comparing academic and non-academic data to evaluate impact
One of ImpactEd’s partner schools, St Clement Danes, has been trialling methods over the last two years to improve GCSE attainment of less engaged Year 11 students. Clearly, a metric for success will be if their programme succeeds in improving GCSE results.
However, they are also using validated scales for test anxiety and school engagement as interim measures to see if they are successful in improving student engagement – and if not, how can they further support students before taking GCSE examinations.
Think about assessment in this area as you would any other in terms of what is proportionate and genuinely helpful for teachers and young people – the commission on assessment without levels has some useful principles. There are lots of good reasons why you may not want to assess particular skills. If unsure, a useful litmus test is to ask: is this going to be helpful for both our teachers and pupils?
If we are serious about developing pupils’ non-cognitive skills, however, we need to think very carefully about how we know if we are making a difference, and how we can use assessment data to inform that. There are better and worse ways to achieve this: hopefully this overview provides some avenues to explore for doing it better.