Intelligence and Educational Assessment

tests.gif (3958 bytes)

What you should know

Reading list

3 basic views of intelligence


Vygotsky (1986) - potential (Hebb calls this intelligence `A').
Current level of intelligence (intelligence `B').
Cattell - fluid abilities - basic - non-verbal - unaffected by experience (more susceptible to old age). Also prospective memory (see Mayner 95).
Crystallised - primarily verbal, influenced by culture and education. Will increase with age (Horn and Donaldson, 1980)
Gardner - 7 unrelated multiple intelligence's (see Table 7.1).
Problems



Sternberg's information processing view
contextual subtheory - intelligence is the successful adaptation to the environment. this could be assessed by asking people what is intelligent or stupid in their culture. In North America three broad groupings of abilities emerge:

  1. practical problem-solving ability
  2. verbal ability
  3. social competence

different results might be expected in different cultures

advantage: intelligence is now observable or concrete rather than abstract and academic

disadvantages:

  1. this view is too inclusive - nearly all behaviours are potentially intelligent because any one behaviour may be useful in at least one context even though generally speaking the behaviour may not be that useful elsewhere.
  2. this view does not describe the processes and structures that help to explain intelligence.

Therefore the three component subtheory is also needed

Three component subtheory

  1. Meta-components - executive processes, selecting cognitive abilities, monitoring them and evaluating their results
  2. Performance components - activities used in carrying out cognitive tasks.
  3. Knowledge-acquisition components - activities involved in acquiring new information.

Intelligence tests correlate well with school achievement.
Conventional measures measure the extent to which the individual has profited from past learning experiences.
Vygotsky (1996) and Feuerstein (1979) to measure learning potential, subjects must be placed in situations in which they must learn rather than in situations where past learning is tapped.
Bloom (1964) correlation's at 0.80 for IQ given at ages 5 and 17.
Use of computers and calculators increases IQ (Salomon et al, 1991).
IQ tests do not tap important qualities, such as interpersonal skills, creativity, athletic ability.
Many are biased against social and ethnic minorities.
`Culture-reduced' tests - non-verbal, use pictures or abstract designs (e.g. Ravens Progressive Matrices test)
McClelland (1973) argues IQ tests bear little relationship to success in life, but Barrett and Depinet (1991) conclude IQ is positively related to job performance.
IQ score ranges from 50 - 160 (average 100)

Group tests given by a teacher

Individual tests

Expensive, but reliable for important decisions. Need an expert to administer these.

SOMPA (System of Multicultural Pluralistic Assessment).

Assess biological and social normality, derive an estimated learning potential (ELP) score - based on WISC-III scores - standardised on ethnic minority samples. - take into account important family variables (e.g. size, income, structure, socio-economic status)
Sattler (1982) criticises SOMPA -The Californian sample not representative, SOMPA predictions no more valid than WISC-III alone. Not wise to use a medical model for educational decisions. Good for detecting gifted African-American children, not detected by other tests (Matthew et al 1992)

Factors that affect manifested intelligence.

Rubber-band hypothesis

We are all born with different sized rubber bands (potential intelligence). These bands can be stretched. Large bands can be stretched further than small bands, but small stretched bands are longer than unstretched `big' bands.
First-borns and only children have higher intelligence, and academic performance.
Intellectual climate of home is a function of family size and position in the family (Zajonc).

Definition of creativity

These are on p197 relate to examples on p198.
Gallagher (1960) - teachers miss 20% of the most highly creative students. School dropout for gifted adolescents is higher than for general population (McMann & Oliver, 1988)
Mistake to think that creativity is to be found only amongst those with the highest IQ. Evidence on pages 199-200.

Measurement of Creativity

Unusual uses test - e.g. brick or nylon stocking. Score for fluency, flexibility and originality (occurs less than 5% of the time).
High intelligence important, but personality and social factors are also important, for creativity.
Getzels and Jackson (1962) - creative students not necessarily have the highest IQ. Not liked by teachers.
High correlation between measured creativity and IQ scores (McCleod & Cropley, 1989)

Guilford 's model of intelligence.

Guilford (1959) see fig 7.7 p202.
120 distinct human abilities
Allows for creativity and intelligence in one model.

Implications for teachers

Divergent and convergent thinking.

Divergent is generating several ideas from a given problem.
Convergent is deriving one solution from a given set of facts.
Divergent thinking is synonymous with creative thinking.

Validity

Face

- appears to measure what it is supposed to.

Content

- is it measuring what is being taught?

Construct

- hypothetical variables - also measured by other tests. - e.g. extroversion is a meaningful concept?

Criterion related

Reliability

- affected by improvement (with age)
chance (especially with multiple-choice) - best to make tests longer or to use many shorter ones.

Maguire (1992) - Teachers often just teach students to pass a test.
Wolf et al - Current school tests - test the skill to detect and select rather than generation.
Memory based, rather than to promote thinking.

Standardized tests

- students results compared with norms.

Use


In America - Anti-testing movement in 50's and 60's because tests thought to be unfair.
But, report `Nation at risk' (1983) persuaded teachers to use tests again.
Nolen and Haas(1991) - raising educational standards is equated to raising test scores.
Teachers are embarrassed by tests, so they teach children to pass tests, which invalidates tests.
Teacher-made tests - essays - maths tests - used to grade or see whether ready for next module.

Types and limitations of psychometric tests

Teachers set their own tests because the tests can cover the material that they have taught. Packages may be too broad.

Evaluation should motivate students, rather than to demotivate them. Tests provide feedback to the students, telling them what needs to be improved and what parts of the curriculum have been mastered.

Tests are also used to make schools more accountable. In America many schools are being too generous with allocating grades (known as 'grade inflation'). The same standardised tests, used by many schools, should guard against grade inflation.

ARE INTELLIGENCE TESTS BIASED?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some critics believe standard IQ tests are biased against certain
personality types. For example, Queens College mathematician, and author of
"The tyranny of testing", Banesh Hoffman wrote in 1962 that standardised
tests disadvantage "intellectually honest candidates with subtle, probing
critical or creative minds" - an enduring criticism that refuses to die
away. In fact, only a few years ago, an academic, Robert Reich, who had
previously served as the US politician responsible for employment,
criticised standardised tests because of their inability to measure
creativity, an attribute he considered vital to many current jobs.

Donald Powers (Educational Testing Service, Princeton, USA) and James
Kaufman (University of California, USA) investigated the relationship
between the Graduate Record Examination (GRE) scores of 342 students and
their conscientiousness, rationality, ingenuity, quickness, creativity and
depth, as measured by self-report personality questionnaires. The GRE is an
IQ-type test used to select candidates for postgraduate study in America.

Overall, the researchers found no substantive evidence to support the
criticisms made by Hoffman and others that IQ-type tests are biased against
creative types. Any links between intelligence scores and personality were
modest and, in fact, relative to the low creativity scorers, there was a
tendency for the students with higher creativity scores to perform better
on the analytical, quantitative and verbal measures of the Graduate Record
Examination.
_______________________________________

Powers, D.E. & Kaufman, J.C. (2004). Do standardised tests penalise
deep-thinking, creative, or conscientious students? Some personality
correlates of Graduate Record Examination test scores. Intelligence, 32,
145-153.

Journal weblink: http://www.sciencedirect.com/science/journal/01602896

Robert Reich's article in Education Week (free registration required):
http://www.edweek.com/ew/ewstory.cfm?slug=41reich.h20&keywords=reich

Purchase "The Tyranny of testing":
http://www.amazon.co.uk/exec/obidos/ASIN/0313200971/qid=1079705589/sr=1-1/ref=sr_1_0_1/026-5190474-4787621

Graduate Record Examinations: http://www.gre.org/

Formative Evaluations

These are tests and essays given at various times throughout a course, in order to find out what needs to be improved. It may be the student who needs imporoving, or perhaps the teaching! Its best to use criterion-referenced evaluation (Gronlund & Linn, 1990).

Summative Evaluations

End of course, or module test. Used to grade a student. Best to use Norm-referenced evaluation (Grunlund & Linn, 1990).

Objective Tests

Fig 13.4 -

Essay versus Objective tests


300 essays rated by 53 judges on 9 point scale
one-third received all possible grades
37% received 8 different grades
23% received 7 different grades
some markers gave moderate marks
others give extremes
knowledge of student affects scores
halo effect - first few good answers affect how rest of essay is marked.

Suggestions for constructing tests

Essays

Questions should be specific for easy scoring.
Restricted response easier to score (as opposed to open-ended or extended-response). For example: In two paragraphs or less, list two similarities and two differences, etc.
Sufficient time to allow students time to use high-level processes (i.e. planning).
Weighting specified.
Wording should make clear the teacher's expectations.
Scoring - outline model answers for one answer before going onto the next.
Intend to be objective.
Specify the number of points available for each part of essay, eg content, organization, application, synthesis of ideas.

Multiple-choice items

Stems

Alternatives

Problems

Norms and normal distribution

Students should make sure that they understand the

(see pages 372-3 Lefrançois, pages 345-7 Banks & Thompson)

Reporting Test Results

Central tendency - mean
median
mode (not really useful!)
The Standard Deviation calculation is illustrated in Table 13.4

Criterion-referenced testing

Anecdote (story) about having to leave lowlands before dark or else eaten. If you are last, as long as you are high enough before dark you are just as well off as the person who was first.
Norm-referenced - compare to others
therefore student can be seen as good in a class of low ability
or student can be seen as bad in a class of high ability
Criteria-referenced - pass a criteria (as in above anecdote).
Choice depends upon what is being tested.
Easy to set criteria for typing, less so for social studies. Criterion-referenced - basic skills, Norm-referenced - higher-level skills (Hopkins, Stanley and Hopkins 1990).
Criteria referencing - no student need consistently fail. This can lead to grade inflation . Suitable cut-off points could be derived from the norm-referenced results of the previous year's classes. Exclusive reliance - thwart students' initiative.
Norm-referenced - better for predicting academic success; but decrease cooperative learning and interaction.

Cureton's (1971) recommended cut-off points for norm referenced data
Grade Standard deviations from mean Percentage of students achieving grade
A 1.5 above 7
B 0.5 to 1.5 above 24
C 0.5 below to 0.5 above 38
D 1.5 below to 0.5 below 24
F (Failure) 1.5 below 7

Ethics of testing

personality tests can invade privacy when they probe into matters that would not ordinarily be publicly revealed.
Tests are threatening when placement, job opportunities, success and failure - depend upon their results.
Can be unjust when used with groups for whom they were not designed.
In America, parents have right of access to school files. When adolescent reaches 18 they can have access.
current testing practices - G. Grant (1991) "test behaviours that are easy to measure, encourage individual accomplishment and competitiveness rather than group performance".

The American Scholastic Aptitude Test (SAT)

Dates back to 1926. Used as an entrance exam for college. Taken by 1 million students in 1991 (Dodge 1991). Two subtests - Verbal and Mathematical

Test bias in the SAT

Average SAT scores in 1991 (Dodge 1991)
  Verbal Maths
African americans 351 385
Whites 441 489

The difference in mathematical ability between the two groups is unlikely to be cultural because maths is relatively independent of cultural bias. The difference is better explained as reflecting a socioeconomic bias. Many African Americans make up the lower socioeconomic ranges. Should something be done about this? Can anything be done?

Can students be prepared for tests?

Messick (1982) - substantial improvements can be made
Cunningham (1986) - modest improvements for short term courses. Intensive training may produce greater scores. Maths can be improved to a higher degree compared to the verbal score.
Such courses are expensive. This accentuates the pre-existing socioeconomic bias.

The decline in SAT scores

The decline in SAT scores
  Verbal Maths
1963 478 502
1981 424 466
1991 422 474

Overall decline from 1963 to 1991 is 9.1%.

Possible explanations

Alternative testing techniques

Developmental assessment - actual accomplishments.
Do not compare with others.
Example of checklist Fig 13.6.

Sampling Performances of thought

Sereda 1992 - Child explains what he is thinking as he attempts to solve problem.
Points awarded for correct strategies, etc.
(See italics on p379)

Exhibitions

Wolf et al (1991) emphasises the profoundly social nature of thinking.
Perform in front of others.
Oral examinations
Musical recitations, etc.

Portfolios

Collection of any evidence of ability, collected over much of the student's time at school.

For

Emphasis on learning how to learn.
Autonomous, reflective, independent, creative thinking.
Might expose social and intellectual skills.

Against

Cumbersome, time consuming, less exact.
Not easily quantifiable - not suitable for deciding which students get admitted to college, or for deciding who gets a scholarship.
Ewell (1991) Generates volumes of material - but no way to analyse it.

The British National Curriculum and Testing

Key Stage 1
Key Stage 2
Key Stage 3
Key Stage 3 Assessment and Testing
GCSE
GNVQ

Other web-pages

US backs quest for brightest children Guardian 06-06-00

a) frying pan or b) fire? Guardian 06-06-00

Testing and Grading by Dr John Lackey

Melanie Phillips on intelligence testing

Return to Gary Sturt's Homepage