Intelligence and Educational Assessment

What you should know

Types and limitations of psychometric tests
Assessing educational performance at different ages (eg Key stages, the role and types of national examinations

Reading list

Guy R. Lefrançois - Psychology for Teaching (8th Ed) - Chapter 7
Steven R. Banks and Charles L. Thompson - Educational Psychology - Chapter 11
David Fontana - Psychology for Teachers (3rd Ed) - Chapter 5
Dennis Child - Psychology and the teacher (3rd Ed) - Chapter 9

3 basic views of intelligence

1 Psychometric - measurement - what IQ tests measure, general underlying ability (g) problem for racists
2 Piagetian - child/environment interaction
3 Information - processing - cognitive processes (e.g. Sternberg )

Vygotsky (1986) - potential (Hebb calls this intelligence `A').
Current level of intelligence (intelligence `B').
Cattell - fluid abilities - basic - non-verbal - unaffected by experience (more susceptible to old age). Also prospective memory (see Mayner 95).
Crystallised - primarily verbal, influenced by culture and education. Will increase with age (Horn and Donaldson, 1980)
Gardner - 7 unrelated multiple intelligence's (see Table 7.1).
Problems

1) some components are difficult to measure.
2) intelligence can not be separated from culture and education. (Gardner and Hatch 1989).

Sternberg's information processing view
contextual subtheory - intelligence is the successful adaptation to the environment. this could be assessed by asking people what is intelligent or stupid in their culture. In North America three broad groupings of abilities emerge:

practical problem-solving ability
verbal ability
social competence

different results might be expected in different cultures

advantage: intelligence is now observable or concrete rather than abstract and academic

disadvantages:

this view is too inclusive - nearly all behaviours are potentially intelligent because any one behaviour may be useful in at least one context even though generally speaking the behaviour may not be that useful elsewhere.
this view does not describe the processes and structures that help to explain intelligence.

Therefore the three component subtheory is also needed

Three component subtheory

Meta-components - executive processes, selecting cognitive abilities, monitoring them and evaluating their results
Performance components - activities used in carrying out cognitive tasks.
Knowledge-acquisition components - activities involved in acquiring new information.

Intelligence tests correlate well with school achievement.
Conventional measures measure the extent to which the individual has profited from past learning experiences.
Vygotsky (1996) and Feuerstein (1979) to measure learning potential, subjects must be placed in situations in which they must learn rather than in situations where past learning is tapped.
Bloom (1964) correlation's at 0.80 for IQ given at ages 5 and 17.
Use of computers and calculators increases IQ (Salomon et al, 1991).
IQ tests do not tap important qualities, such as interpersonal skills, creativity, athletic ability.
Many are biased against social and ethnic minorities.
`Culture-reduced' tests - non-verbal, use pictures or abstract designs (e.g. Ravens Progressive Matrices test)
McClelland (1973) argues IQ tests bear little relationship to success in life, but Barrett and Depinet (1991) conclude IQ is positively related to job performance.
IQ score ranges from 50 - 160 (average 100)

Group tests given by a teacher

Draw a person - originally developed by Goodenough (1926) and revised by Harris (1963) and Naglieri (1988). Children's drawings are supposed to reflect their conceptual sophistication.
CogAT - a paper and pencil test for grades 3 to 13. Three scores are given - verbal, quantitative, and non-verbal, which are combined to give an overall IQ score. A child's performance can be compared with normative data.
Otis - Lennon School Abilities Test - this test is suitable for grades 1 to 12. Its yields just one score (known as the "school ability index" (SAI)) and the test is made up of a mixture of items including vocabulary, reasoning, numerical etc., arranged in order of difficulty.

Individual tests

Expensive, but reliable for important decisions. Need an expert to administer these.

Peabody Picture Vocabulary Test-R

Choose 1 picture out of 4 that matches word spoken by experimenter. Items are arranged in order of difficulty and the test is terminated after six consecutive incorrect answers. The child's IQ is calculated by taking into account what level the child achieved in the test as well as his or her age.
Revised Stanford-Binet

This test does not use the term IQ; Instead the term 'standard age score' (SAS) is used. Items are graded in order of difficulty. Four separate scores are given: verbal reasoning, quantitative reasoning, abstract/visual reasoning, and short-term memory. These scores can be combined to give a measure of "adaptive ability".
- Verbal reasoning
- quantitative reasoning [maths]
- abstract/visual reasoning [block design, copying figures, predicting what a folded design would look like once unfolded]
- short-term memory (repeating a sentence, reproducing a pattern made of beads, recalling a series of pictures in order)
Wechsler scales
(WISC-III)

Similar to Stanford-Binet
Adult and pre-school versions exist.
2 basic sections - verbal (reasoning and vocabulary skills) and performance (visual-spatial skills)

Verbal section has 6 subtests
1. Information test - general knowledge
2. Similarities test - comparing two items for similarity
3. Vocabulary test - defining words
4. Comprehension test - asked about what would be appropriate in a given situation
5. Arithmetic test - questions presented verbally
6. Digit span test - repeating back digits
The Performance Section
1. Picture Completion test - identifying missing part of picture
2. Picture Arrangement test - placing pictures in order, so as to tell a story
3. Block design test - Arranging cubes with red and white designs, so as to copy given pattern
4. Object assembly test - constructing an object from pieces that need to be joined in a fixed order
5. Coding test - copying non-verbal symbols in order
6. Mazes test - tracing a maze
See Le françois table 7.2 p190

SOMPA (System of Multicultural Pluralistic Assessment).

Assess biological and social normality, derive an estimated learning potential (ELP) score - based on WISC-III scores - standardised on ethnic minority samples. - take into account important family variables (e.g. size, income, structure, socio-economic status)
Sattler (1982) criticises SOMPA -The Californian sample not representative, SOMPA predictions no more valid than WISC-III alone. Not wise to use a medical model for educational decisions. Good for detecting gifted African-American children, not detected by other tests (Matthew et al 1992)

Factors that affect manifested intelligence.

Family size and birth order
Ethnic background
Social class

Rubber-band hypothesis

We are all born with different sized rubber bands (potential intelligence). These bands can be stretched. Large bands can be stretched further than small bands, but small stretched bands are longer than unstretched `big' bands.
First-borns and only children have higher intelligence, and academic performance.
Intellectual climate of home is a function of family size and position in the family (Zajonc).

Definition of creativity

These are on p197 relate to examples on p198.
Gallagher (1960) - teachers miss 20% of the most highly creative students. School dropout for gifted adolescents is higher than for general population (McMann & Oliver, 1988)
Mistake to think that creativity is to be found only amongst those with the highest IQ. Evidence on pages 199-200.

Measurement of Creativity

Unusual uses test - e.g. brick or nylon stocking. Score for fluency, flexibility and originality (occurs less than 5% of the time).
High intelligence important, but personality and social factors are also important, for creativity.
Getzels and Jackson (1962) - creative students not necessarily have the highest IQ. Not liked by teachers.
High correlation between measured creativity and IQ scores (McCleod & Cropley, 1989)

Guilford 's model of intelligence.

Guilford (1959) see fig 7.7 p202.
120 distinct human abilities
Allows for creativity and intelligence in one model.

Implications for teachers

1 Complexity of intellectual processes.
2 Variety of forms in which ability can be expressed.
3 Importance of instructional process. Need greater emphasis on creative thinking, evaluation, implications, etc. Programs designed to foster learning/thinking strategies.

Divergent and convergent thinking.

Divergent is generating several ideas from a given problem.
Convergent is deriving one solution from a given set of facts.
Divergent thinking is synonymous with creative thinking.

Validity

Face

- appears to measure what it is supposed to.

Content

- is it measuring what is being taught?

Construct

- hypothetical variables - also measured by other tests. - e.g. extroversion is a meaningful concept?

Criterion related

Concurrent

- agrees with other tests
predictive

Reliability

- affected by improvement (with age)
chance (especially with multiple-choice) - best to make tests longer or to use many shorter ones.

repeated measures
parallel forms - 2 versions of same test
split-half reliability

Maguire (1992) - Teachers often just teach students to pass a test.
Wolf et al - Current school tests - test the skill to detect and select rather than generation.
Memory based, rather than to promote thinking.

Standardized tests

- students results compared with norms.

Use

Placement
certify achievement
Judge teachers
evaluate schools
Instructional diagnoses

In America - Anti-testing movement in 50's and 60's because tests thought to be unfair.
But, report `Nation at risk' (1983) persuaded teachers to use tests again.
Nolen and Haas(1991) - raising educational standards is equated to raising test scores.
Teachers are embarrassed by tests, so they teach children to pass tests, which invalidates tests.
Teacher-made tests - essays - maths tests - used to grade or see whether ready for next module.

Types and limitations of psychometric tests

Teachers set their own tests because the tests can cover the material that they have taught. Packages may be too broad.

Evaluation should motivate students, rather than to demotivate them. Tests provide feedback to the students, telling them what needs to be improved and what parts of the curriculum have been mastered.

Tests are also used to make schools more accountable. In America many schools are being too generous with allocating grades (known as 'grade inflation'). The same standardised tests, used by many schools, should guard against grade inflation.

ARE INTELLIGENCE TESTS BIASED?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some critics believe standard IQ tests are biased against certain
personality types. For example, Queens College mathematician, and author of
"The tyranny of testing", Banesh Hoffman wrote in 1962 that standardised
tests disadvantage "intellectually honest candidates with subtle, probing
critical or creative minds" - an enduring criticism that refuses to die
away. In fact, only a few years ago, an academic, Robert Reich, who had
previously served as the US politician responsible for employment,
criticised standardised tests because of their inability to measure
creativity, an attribute he considered vital to many current jobs.

Donald Powers (Educational Testing Service, Princeton, USA) and James
Kaufman (University of California, USA) investigated the relationship
between the Graduate Record Examination (GRE) scores of 342 students and
their conscientiousness, rationality, ingenuity, quickness, creativity and
depth, as measured by self-report personality questionnaires. The GRE is an
IQ-type test used to select candidates for postgraduate study in America.

Overall, the researchers found no substantive evidence to support the
criticisms made by Hoffman and others that IQ-type tests are biased against
creative types. Any links between intelligence scores and personality were
modest and, in fact, relative to the low creativity scorers, there was a
tendency for the students with higher creativity scores to perform better
on the analytical, quantitative and verbal measures of the Graduate Record
Examination.
_______________________________________

Powers, D.E. & Kaufman, J.C. (2004). Do standardised tests penalise
deep-thinking, creative, or conscientious students? Some personality
correlates of Graduate Record Examination test scores. Intelligence, 32,
145-153.

Journal weblink: http://www.sciencedirect.com/science/journal/01602896

Robert Reich's article in Education Week (free registration required):
http://www.edweek.com/ew/ewstory.cfm?slug=41reich.h20&keywords=reich

Purchase "The Tyranny of testing":
http://www.amazon.co.uk/exec/obidos/ASIN/0313200971/qid=1079705589/sr=1-1/ref=sr_1_0_1/026-5190474-4787621

Graduate Record Examinations: http://www.gre.org/

Formative Evaluations

These are tests and essays given at various times throughout a course, in order to find out what needs to be improved. It may be the student who needs imporoving, or perhaps the teaching! Its best to use criterion-referenced evaluation (Gronlund & Linn, 1990).

Summative Evaluations

End of course, or module test. Used to grade a student. Best to use Norm-referenced evaluation (Grunlund & Linn, 1990).

Objective Tests

Fig 13.4 -

completion
matching
True-False
Multiple Choice.

Essay versus Objective tests

1 Easier to tap higher-level processes - organisation, inferences etc.
2 Content limited - can't test a wide variety of topics.
3 Make divergence possible
4 Easy for teacher to construct essays
5 Scoring essays takes longer. Objective tests can use computer technology.
6 Essay scores are unreliable. (Educational testing service 1961)

300 essays rated by 53 judges on 9 point scale
one-third received all possible grades
37% received 8 different grades
23% received 7 different grades
some markers gave moderate marks
others give extremes
knowledge of student affects scores
halo effect - first few good answers affect how rest of essay is marked.

Suggestions for constructing tests

Essays

Questions should be specific for easy scoring.
Restricted response easier to score (as opposed to open-ended or extended-response). For example: In two paragraphs or less, list two similarities and two differences, etc.
Sufficient time to allow students time to use high-level processes (i.e. planning).
Weighting specified.
Wording should make clear the teacher's expectations.
Scoring - outline model answers for one answer before going onto the next.
Intend to be objective.
Specify the number of points available for each part of essay, eg content, organization, application, synthesis of ideas.

Multiple-choice items

Stems

Stems should be complete; not half of a sentence which is completed by one of the alternatives.
Make stems clear and concise
Stems should be longer than the alternatives
Questions should be positive (eg 'Which one was....') rather than negative (eg 'Which one was not...').

Alternatives

should be grammatically consistent with stem
each should be a plausible answer
students, when guessing, tend to choose either the first or the longest alternative.

Problems

Unable to measure problem-solving ability
Unable to measure critical thinking
Does not measure the student's ability to develop and express an idea
Does not measure the application of knowledge to new situations

Norms and normal distribution

Students should make sure that they understand the

'Normal Distribution Curve'
Mean and Standard deviation
percentiles - eg 75th percentile is the score where 75% of subjects fall on or below it
Z-scores - equivalent to standard deviations, with the mean being equal to a Z-score of 0
T-scores - a T score of 50 is the mean and a T-score interval of 10 is equal to one standard deviation (eg a T-score of 60 is 1 standard deviation above the mean)
Stanines - a score of 1 is 2 standard deviations below the mean, 3 is 1 standard deviation below, 5 is the mean, and so forth up to a maximum of 9 (2 standard deviations above the mean)

(see pages 372-3 Lefrançois, pages 345-7 Banks & Thompson)

Reporting Test Results

Central tendency - mean
median
mode (not really useful!)
The Standard Deviation calculation is illustrated in Table 13.4

Criterion-referenced testing

Anecdote (story) about having to leave lowlands before dark or else eaten. If you are last, as long as you are high enough before dark you are just as well off as the person who was first.
Norm-referenced - compare to others
therefore student can be seen as good in a class of low ability
or student can be seen as bad in a class of high ability
Criteria-referenced - pass a criteria (as in above anecdote).
Choice depends upon what is being tested.
Easy to set criteria for typing, less so for social studies. Criterion-referenced - basic skills, Norm-referenced - higher-level skills (Hopkins, Stanley and Hopkins 1990).
Criteria referencing - no student need consistently fail. This can lead to grade inflation . Suitable cut-off points could be derived from the norm-referenced results of the previous year's classes. Exclusive reliance - thwart students' initiative.
Norm-referenced - better for predicting academic success; but decrease cooperative learning and interaction.

Cureton's (1971) recommended cut-off points for norm referenced data
Grade	Standard deviations from mean	Percentage of students achieving grade
A	1.5 above	7
B	0.5 to 1.5 above	24
C	0.5 below to 0.5 above	38
D	1.5 below to 0.5 below	24
F (Failure)	1.5 below	7