Session 2:
The Role of Frequency in Word Selection and Learning
|
The Role of Frequency in Word Selection and Learning |
Sam Barclay
REVIEW
Vocabulary is just one of the goals for language learners. Others include language features (grammar and pronunciation), ideas such as subject and cultural knowledge, elements such as strategies and fluency, and discourse knowledge such as text structures and conversational rules.
Vocabulary was overlooked in traditional structuralist, behaviourist, and communicative approaches to language learning. Now, it is increasingly recognised as a vital part of a language programme. Teachers need to adopt a systematic approach to teaching and learning it.
Vocabulary knowledge correlates strongly with language performance.
REVIEW
There are different ways of counting vocabulary. Word types is the most sensitive as each inflection is counted as a new word. Lemmas include the base (also stem or root) and the inflections. Word families include the base, inflections, and derivations. Flemmas include the base, the inflections, and derivations which have the same form as the base/inflections but in a different part of speech. Word family is the most inclusive counting method.
The counting method we use reflects variables such as purpose (e.g., reading, speaking) and proficiency (do learners have morphological knowledge).
More on lemmas and word Families
Bauer & Nation (1993) identified 6 levels based on…
The number of words the affix occurs in e.g., -er (FREQ)
The likelihood that the affix will be used to for new words e.g., -ly (PROD)
The degree of predictability of the meaning of the affix e.g., -s (PRED)
The degree of predictability of change in the written form when the affix is added e.g., -ish (ORTH REG)
The degree of predictability of change in the spoken form when affix is added e.g., -er (PHONO REG)
The regularity of the spelling of the affix e.g., in- (im-, il-, ir-) (AFFIX ORTH REG)
The regularity of the spoken form of the affix e.g., -ed (AFFIX PHONO REG)
Regularity of function e.g., -ess always makes noun from noun (REG FUNCTION)
More on lemmas and word Families
Bauer & Nation (1993) identified 6 levels:
Each form is a different wordAt this level it is assumed that learners will not recognise that book and books are members of the same word family.
More on lemmas and word Families
Bauer & Nation (1993) identified 6 levels:
Each form is a different word
Inflectional suffixesAt this level, words with the same base and inflections are considered one item.
More on lemmas and word Families
Bauer & Nation (1993) identified 6 levels:
Each form is a different word
Inflectional suffixes
Most frequent and regular derivational affixesOnly orthographic alternations, such as <y> becomes <i> are permitted: -able, -er, -ly, -ness, non-, etc.
More on lemmas and word Families
Bauer & Nation (1993) identified 6 levels:
Each form is a different word
Inflectional suffixes
Most frequent and regular derivational affixes
Frequent, orthographically regular affixes-al, -ation, -ess, -ful, -ism
More on lemmas and word Families
Bauer & Nation (1993) identified 6 levels:
Each form is a different word
Inflectional suffixes
Most frequent and regular derivational affixes
Frequent, orthographically regular affixes
Regular but infrequent affixesante- (antechamber), -ward (homeward), circum (circumnavigate)
More on lemmas and word Families
Bauer & Nation (1993) identified 6 levels:
Each form is a different word
Inflectional suffixes
Most frequent and regular derivational affixes
Frequent, orthographically regular affixes
Regular but infrequent affixes
Frequent but irregular affixes-ee (nominee), -ion (redemption), -th (depth)
S1 Activity
Reflect on your learning and teaching experience to date as you consider the following situation.
How would you answer that student who asks, “Could you tell me how to study vocabulary?” or “Which words should I learn?”?
Please write your answers and bring them to our next session.
Pre-session reading
In the next session, we're going to talk about how we define vocabulary. In preparation for that session, please read the following:
Webb & Nation (2017) - Chapter 1: Click here for the chapter
Coxhead (2000) - Click here for the journal article
After reading, please answer the following questions and be ready to discuss your answers in the session.
What principles would you choose to select words to teach to your students?
What role should word frequency play in our teaching?
How can we manipulate texts to promote the learning of high-frequency / academic vocabulary?
Is frequency the only metric we should use to judge usefulness? Can you think of any more?
How many words do L1 speakers of English know?
McLean, Hogg, and Kramer (2014)
Brysbaert et al (2016)
L1 users of English know between 11-16 thousand word families. Many L2 users of English do not get anywhere near this figure.
WHAT IT MEANS TO KNOW A WORD?
WHAT DOES IT MEAN TO KNOW A WORD?
Common distinction made between vocabulary size/breadth and vocabulary depth/quality of knowledge (Anderson & Freebody, 1981; Read, 2004).
These are important concepts to understand, but are not well defined (Schmitt & Schmitt, 2020).
What do you understand by these terms?
WHAT DOES IT MEAN TO KNOW A WORD?
Vocabulary size/breadth
How many words are known?
Vocabulary depth/quality
How well are those words known?
🡪 Seems simple, but actually a little more complicated
WHAT DOES IT MEAN TO KNOW A WORD?
Why is it more complicated?
depends on conceptualisation and operationalisation of the vocabulary knowledge construct.
Size - how are we measuring word knowledge? How are we counting words – as word types, lemmas, word families?
Let’s look at two approaches to conceptualising vocabulary knowledge
The Word Knowledge Approach and the Developmental Approach
THE WORD KNOWLEDGE APPROACH
Describes everything that can be known about a word.
Describes maximal knowledge
Register more important for swear words, for example.
So, vocabulary more than form and meaning connection.
WHAT DOES IT MEAN TO KNOW A WORD?
Word Knowledge Approach
Jack Richards (1976)
Form
Meaning(s)
Frequency
Lexical and grammatical collocation
Register
Syntactic behaviour
Associations
Derivations
WHAT DOES IT MEAN TO KNOW A WORD?
Word Knowledge Approach
Paul Nation (1990, 2001, 2013)
Spoken form
Written form
Word parts
Form and meaning
Multiple meanings
Associations
Syntactic behavior
Lexical and grammatical collocations
Register/frequency
THE WORD KNOWLEDGE APPROACH
THE DEVELOPMENTAL APPROACH
Number of ways of conceptualising word knowledge (Henriksen, 99)
Partial 🡪 precise knowledge of word meaning
Depth of knowledge of different word-knowledge aspects
Receptive knowledge 🡪 Productive knowledge
THE DEVELOPMENTAL APPROACH
Number of ways of conceptualising word knowledge (Henriksen, 99)
Partial 🡪 precise knowledge of word meaning
THE DEVELOPMENTAL APPROACH
Number of ways of conceptualising word knowledge (Henriksen, 99)
Depth of knowledge of different word-knowledge aspects
THE DEVELOPMENTAL APPROACH
Number of ways of conceptualising word knowledge (Henriksen, 99)
Depth of knowledge of different word-knowledge aspects
So, vocabulary learning is incremental in many ways. Learner knowledge develops in terms of which aspects are known, but also in terms of how well each aspect is known.
THE DEVELOPMENTAL APPROACH
Number of ways of conceptualising word knowledge (Henriksen, 99)
Receptive knowledge 🡪 Productive knowledge
Do you know more vocabulary receptively or productively?
THE DEVELOPMENTAL APPROACH
Number of ways of conceptualising word knowledge (Henriksen, 99)
Receptive knowledge 🡪 Productive knowledge
Do you know more vocabulary receptively or productively?
Receptive > productive knowledge – Laufer (2005) found 16% of words at the 5,000 frequency level known productively, and 35% at the 2,000 level. Often, we don’t see vocabulary knowledge, because we only look for developed not developing word knowledge.
THE DEVELOPMENTAL APPROACH
Tests can look at form and meaning recall and recognition (Schmitt, 2010)
Which do you think is easiest? Which is hardest?
WHAT DOES IT MEAN TO KNOW A WORD?
Laufer & Goldstein, 2004
Order of test difficulty for form-meaning link
This Photo by Unknown Author is licensed under CC BY-SA
TYPES OF VOCABULARY AND THE IMPORTANCE OF FREQUENCY
TYPES OF VOCABULARY
Problem: many words in English (600,000+ word families)
How can we choose which words to teach?
Most useful vocabulary, but WHICH WORDS ARE MOST USEFUL?
What’s the purpose for learning English? (ESP/EAP/EFL)
What’s the current proficiency level?
Which learning will give you the most bang for your buck?
THE ROLE OF FREQUENCY IN WORD SELECTION
Not all words are created equal
Zipf’s Law (1935): The frequency of a word is inversely proportional to its rank (7, 3.5, 1.75, 0.875).
“few very high-frequency words that account for most of the tokens in a text…and many low-frequency words” (Piantadosi, 2014)
You can try by clicking here
THE ROLE OF FREQUENCY IN WORD SELECTION
In English, we have the following figures (Nation & Waring, 97):
Top 10 word families = 20% of language use (mainly function words)
Top 50 word families = 35% of language use
Top 100 word families = 41% of language use
Top 2,000 word families = 80% of language use
But, we know that beyond very frequent words, frequency depends on domain…
THE ROLE OF FREQUENCY IN WORD SELECTION
Rank the following words in order of frequency.
About
Approximately
Nice
Significant
Bad
Mitochondria
Lexis
How would your answers be different if you were considering a medical, academic, or linguistic context?
FREQUENCY MATTERS
Frequency correlates with learnability (Ellis, 2002)
Frequent words encountered more often 🡪 facilitates learning.
Reading low-frequency words harder in L1 and L2 (van Heuven, 2014)
Strange that assessments encourage use of low-frequency words…!
HISTORY – VOCABULARY CONTROL MOVEMENT
Basic English(Ogden & Richards, 1925)
850 words, but 12,000 meaning senses!
General Service List (West, 1953)
most common 2,000 words.
Ranked for frequency
Multiple meaning senses and parts of speech.
Word Categories
We have different categories of words
Paul Nation (RANGE)
1K + 2K = high frequency
Academic
Low-frequency
Schmitt & Schmitt (2014)
1K - 3K = high frequency
4k – 9k = mid frequency
9 + = Low-frequency
General Vocabulary
What do you understand by general vocabulary?
General Vocabulary
Useful across a wide range of topics / contexts / speaking / writing
Frequency and distribution across domains important
This vocabulary is stable over time (Kilgarriff, 2007)
Teachers should prioritise high-frequency words (Nation, 2001; Read, 2004)…How many words is this?
Nation (2001) – 2,000
Schmitt & Schmitt (2014) – 3,000
Dang & Webb (2017) – 1,000
Students need 2-3k, but 1k good initial learning goal.
General Vocabulary
Frequency is not everything (Nation, 1990)
Context (blackboard, textbook)
Age
Also, need to think about specific needs of learners.
Frequent in which context?
Which meaning sense is most frequent?
Mid Frequency (Schmitt & Schmitt, 2014)
After high-frequency, vocabulary develops according to exposure.
Words become more domain specific (Gardner, 2014)
So, after high-frequency, teach vocabulary according to specific needs of learners.
Difficult to acquire mid-frequency vocabulary incidentally (it’s not frequent enough) 🡪 domain-specific word lists (Coxhead, 2000; Dang, 2017)
Low frequency
“As the benefits of learning low-frequency words in terms of added coverage are rather limited, and there are so many of them, it is not very useful to dedicate a lot of classroom attention to low-frequency words” (Vilkaitè-Lozdienè & Schmitt, 2019)
🡪 Frequency will depend on the corpus used. Medical terms might be low-frequency in a general corpus but higher in a targeted corpus. Useful for ESP.
Specific word lists
Focus on domain-specific vocabulary after acquiring high-frequency items (Nation, 2001).
How do you know what items are domain specific?
Could look in textbooks and infer.
Could ask domain experts.
Could conduct corpus research.
Can use Keyword function. This tells us which words are significantly more frequent in one corpus than another.
Specific Word Lists
Academic Vocabulary
Frequency and distribution important.
More frequent in academic corpus than general English corpus.
Should be distributed across academic domains 🡪 general academic vocabulary.
Most important = AWL (Coxhead, 2000).
Assumes knowledge of GSL.
Comprises 570 word families.
Provides about 7-10% coverage of most academic texts.
Other lists include AVL (Gardner & Davies, 2014)
lemma based
does not assume knowledge of high-frequency vocabulary.
Better coverage of academic texts than AWL (see Gardner & Davis, 2014).
THE ROLE OF FREQUENCY IN WORD SELECTION
Technical Vocabulary
Engaging with specific contexts
plumbing, speaking in hard science, etc.
Frequency is important, distribution less important.
Words frequent in specific corpus
Not distributed across a range of domains.
The level of specificity depends on purpose of corpus.
Leads to specific word lists.
Specific word lists
Issues
Purpose very important (speaking, reading, etc.)
Polysemous words (words with many meaning senses).
brief
case
paper
closing
Specific and general vocabulary is important, but it is the core high-frequency words that do most of the work!
Specific and general vocabulary is important, but it is the core high-frequency words that do most of the work!
lexical bundle frequency word sequences
Psycholinguistically frequency profiles selection criteria
frequency-based lists formulaic expressions
Look at this extract from Simpson-Vlach & Ellis (2010).
Can you understand it?
Just technical words
Crucial factor – achieving – goal – principles – identifying - lexical bundle – approach – colleagues – solely - frequency - straightforward – word sequences – collapse distinctions – relevant – sequences - psycholinguistically – sequences - frequency profiles – academic - selection criteria - frequency-based lists - formulaic expressions
Technical and academic words
A crucial factor in achieving this goal lies in the principles for identifying and classifying such units. The lexical bundle approach of ____ and colleagues, based solely on frequency, has the advantage of being _______ straightforward, but results in long lists of ______ word sequences that collapse distinctions that ____ would ____ relevant. For example, few would argue with the ____ claim that sequences such as ‘on the other hand’ and ‘at the same time’ are more psycholinguistically ____ than sequences such as ‘to do with the’ or ‘i think it was’ even though their frequency profiles may put them on equivalent lists. Selection criteria that allow for ____ weeding of purely frequency based lists, as used by ____ in a study of formulaic expressions in academic speech, yield much shorter lists of expressions that may ____ to ____ ____, but they are ____ ____ and open to claims of ____.
High-frequency, general academic, and specific
Why is all this important?
Lexical coverage
What % of text y do you need to understand?
How many words in text y occur on wordlist x?
If my learners know word list x, they are likely to understand z% of text y.
Lexical Coverage
The relationship between
vocabulary and reading:
95% - Laufer (1989)
98% - Hsueh-Chao &
Nation (2000)
Cline – Schmitt et al (‘11)
Lexical Coverage
So, how many word families does it take to reach 95% (minimal) and 98% (optimal) coverage (Laufer & Ravenhorst-Kalovski, 2010)?
In reading:
95% = 4,000 – 5,000 word families
98% = 8,000 – 9,000 word families
Lexical Coverage - Listening
The relationship between
vocabulary and listening:
90% - Minimal
95% - Optimal
(see van Zeeland &
Schmitt, 2013)
Lexical Coverage - Listening
So, how many word families does it take to reach 90% (minimal) and 95% (optimal) coverage (van Zeeland & Schmitt, 2013)?
In listening:
90% = > 2,000 word families
95% = 2,000 - 3,000 word families
Lexical Coverage
So, which words should you prioritise?
What should you do as teachers to encourage comprehension and learning of those words?
Think about a textbook you know. What do you think of the word selection?
Evaluating word lists
What criteria would you use to evaluate word lists?
How old?
What unit of counting?
What’s it for?
How was it made?
How was it validated?
Answer these questions for the AWL
Frequency + Formulaic Language
There are also many useful lists of formulaic language:
Phrasal Verbs – Garnier and Schmitt (2015)
Phrasal expressions – Martinez and Schmitt (2012)
Collocations - https://pearsonpte.com/wp-content/uploads/2014/07/AcademicCollocationList.pdf
Frequency + Formulaic Language
How does frequency differ when you are looking at more than one word?
Mutual Information (MI) (see Hunston 2002):
Measure of strength of collocation – how often they co-occur.
Compare co-occurrence vs separate occurrence
>=3
Frequency + Formulaic Language
How does frequency differ when you are looking at more than one word?
t-score (see Hunston 2002):
Measure of confidence of co-occurrence.
>=2
Read https://wordbanks.harpercollins.co.uk/other_doc/statistics.html
Frequency + Formulaic Language
SOUR PUSS
FALLING PRICES
TASKS
Complete the pre-session reading for the next class.
Complete Exercise 3 on page 136 of Schmitt & Schmitt (2020). In addition to the text, reflect on how easy it was to complete this task.
Areas you could research
What derivational knowledge do learners know in your context?
What affixes can we expect most learners to know?
Look at the strategies used by learners to understand low-freq. items.
Use teachers and learners to validate a word list.
Develop a word list of subject-specific vocabulary
Look at textbooks that claim to be at the same level. How consistent is the vocabulary coverage of these books?
Replicate a coverage study.
Useful tools
Complete Lexical Tutor – https://www.lextutor.ca/vp/
AWL highlighter - https://www.eapfoundation.com/vocab/academic/highlighter/
PHaVe list flashcards – https://quizlet.com/br/389971142/the-phave-list-flash-cards/
DDL – http://flax.nzdl.org/greenstone3/flax
Further reading in the area
Listen to Averil Coxhead talk about the development of wordlists.https://www.teachingenglish.org.uk/article/what-do-esp-teachers-need-know-about-word-lists-language-learning-teaching
Watch Charlie Brown talk about the development of the NGSL: http://www.newgeneralservicelist.org/tedtalk
Read about Mark Davies and Dee Gardner creating the Academic Vocabulary List. https://www.academicvocabulary.info/x.asp
Read about how Melodie Garnier and Norbert Schmitt made the PHaVe list. https://journals.sagepub.com/doi/10.1177/1362168814559798