Вы находитесь на странице: 1из 46

CogSci 131

The poverty of the stimulus

Tom Griffiths

Language as a formal system

Noam Chomsky

Language
a set (finite or infinite) of sentences, each finite in
length and constructed out of a finite set of
elements

This is a good sentence


Sentence bad this is

1
0

all sequences

linguistic analysis aims to separate the grammatical


sequences which are sentences of L from the
ungrammatical sequences which are not

The Chomsky hierarchy


Languages

Machines

Computable
Context sensitive
Context free
Regular
Finite

Turing machine
Bounded TM
Push-down automaton
Finite state automaton

Geography

Outline
The argument for innate knowledge
Break
Formal models of learning

Platos problem

How comes it that human beings, whose contacts


with the world are brief and personal and limited, are
able to know as much as they know?
(via Bertrand Russell)

Socrates. And at present these notions have just been stirred up in him, as in a dream; but if he were
frequently asked the same questions, in different forms, he would know as well as any one at last?
Meno. I dare say.
Socrates. Without any one teaching him he will recover his knowledge for himself, if he is only asked
questions?
Meno. Yes.
Socrates. And this spontaneous recovery of knowledge in him is recollection?
Meno. True.
Socrates. But if he did not acquire the knowledge in this life, then he must have had and learned it at
some other time?
Meno. Clearly he must.
Socrates. Which must have been the time when he was not a man?
Meno. Yes.
Socrates. And if there have been always true thoughts in him which only need to be awakened into
knowledge by putting questions to him, his soul must have always possessed this knowledge

Modernizing Plato
Q: Why do we know so much?
A: aspects of knowledge and understanding
are innate, part of our biological endowment,
genetically determined, on a par with the
elements of our common nature that cause
us to grow arms and legs rather than wings.
(Chomsky)

Language and Platos problem


we must determine how the child comes to
master the rules and principles that constitute the
mature system of knowledge of language
Three factors:
1.
2.
3.

genetically determined principles of the language faculty


genetically determined general learning mechanisms
linguistic experience

The poverty of the stimulus


The rules and principles that constitute the
mature system of knowledge of language are
actually very complicated
There isnt enough evidence to identify these
principles in the data available to children
Therefore
Acquisition of these rules and principles must
be a consequence of the genetically
determined structure of the language faculty

Universal grammar
The set of languages that human beings can
learn must be strongly constrained
(in a way that allows the rules and principles of
grammar to be identified from limited input)

The theory of this set is universal grammar


(which is not really a grammar)

The innate constraints that are expressed in


universal grammar are specific to language
e.g., simplicity isnt good enough

Gavagai!
(Quine, 1960)

Possibilities:

Rabbit
Dinner
Undetached rabbit parts
Momentary rabbit-stage
Mass of rabbithood
Temporal cross-section of a
four-dimensional spacetime extension of a rabbit

Other constraints
The child approaches language with an intuitive
understanding of such concepts as physical
object, human intention, volition, causation, goal,
and so on.

To what extent is this understanding innate?


(do we have innate domain-specific knowledge?)

Controversy
Are the premises of the argument valid?

The poverty of the stimulus


The rules and principles that constitute the
mature system of knowledge of language are
actually very complicated
There isnt enough evidence to identify these
principles in the data available to children
Therefore
Acquisition of these rules and principles must
be a consequence of the genetically
determined structure of the language faculty

Controversy
Are the premises of the argument valid?
There isnt enough evidence to identify these
principles in the data available to children
How should we interpret this?
weak sense: something is innate
strong sense: linguistic nativism

To evaluate this premise, we need:


an accurate idea of the data seen by children
understanding of the limits of learning

Break

Up next:
Formal models of learning

Formal models of learning


(at the computational level)

Golds theorem
identification in the limit

Probably Approximately Correct learning


more common in modern machine learning

What should we conclude from these models?


do we accept the assumptions?
do we accept the conclusions?

Formal models of learning


(at the computational level)

Golds theorem
identification in the limit

Probably Approximately Correct learning


more common in modern machine learning

What should we conclude from these models?


do we accept the assumptions?
do we accept the conclusions?

Golds game
How do we show that a set of hypotheses
about the structure of language can be
learned using a particular algorithm?
Golds approach: define a game that the
learner plays against an adversary
If the learner can define an algorithm that
wins the game for every hypothesis in the
set, that set is learnable by that algorithm

Rules of the game


Set of hypotheses
You choose a learning algorithm
chooses a hypothesis from the set for any input

I choose a hypothesis from the set, and


generate a text for this hypothesis (+ve examples)
infinite sequence, every element at least once

You win if your learning algorithm identifies my


hypothesis in the limit
at some point t and all points thereafter, chooses
the right hypothesis

Languages and numbers


Our languages will just be sets of numbers
thats fine by Chomsky languages are just sets of
sequences, and we can assign numbers to
possible sequences

If this is counter-intuitive, think of hairy dogs:


1 = the hairy dog runs
2 = the hairy hairy dog runs

Game 1
Set of hypotheses:
{1}, {1, 2}, {1, 2, 3}, , {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
(call these L1, L2, L3, , L10)

Should you play?

Game 2
Set of hypotheses:
{1}, {1, 2}, {1, 2, 3}, , {1, 2, 3, 4, , }
(call these L1, L2, L3, , L)

Should you play?

How I make my text


Start with a sequence of 1s
your algorithm must eventually choose L1

Now add 2 to the sequence


your algorithm must change its choice

I can force changes after each choice


Algorithm either changes infinitely often on L,
and loses, or fixes on some hypothesis
if Ln, I complete my text with L
if L, I complete my text with Ln

Game 3
Set of hypotheses:
{2,3,4,5}, {1,3,4,5}, {1,2,4,5}, {1,2,3,5},{1,2,3,4}
(call these L1, L2, L3, )

Should you play?

Game 4
Set of hypotheses:
N-{1}, N-{2}, N-{3}, where N = {1, 2, 3, , }
(call these L1, L2, L3, )

Should you play?

How you make your algorithm


Start with L1
On seeing a 1, change to Ln for smallest n not
seen so far
At some point, algorithm fixes on a hypothesis
that is not eliminated by further evidence

Game 5
Set of hypotheses:
N, N-{1}, N-{2}, N-{3}, where N = {1, 2, 3, , }
(call these L, L1, L2, L3, )

Should you play?

How I make my text


Start with numbers from L1
your algorithm must eventually choose L1

Now add 1 to the sequence


your algorithm must change its choice

I can force changes after each choice


Algorithm either changes infinitely often on L,
and loses, or fixes on some hypothesis
if Ln, I complete my text with L
if L, I complete my text with Ln

Golds Theorem
Provided with just a text, any set of languages
which contains all finite languages and at least one
infinite language is not identifiable in the limit.

The Chomsky hierarchy


Languages

Machines

Computable
Context sensitive
Context free
Regular
Finite

Turing machine
Bounded TM
Push-down automaton
Finite state automaton

Golds Theorem
Provided with just a text, any set of languages which
contains all finite languages and at least one infinite
language is not identifiable in the limit.
None of the classes of (>finite) languages in the
Chomsky hierarchy is identifiable in the limit from just
positive evidence (i.e. sentences in the language).
Learning language requires strong constraints on the
set of possible languages.

What sort of constraints?


Option 1:
A finite set of languages
but what about {N - {x}| x N }?
Option 2:
A set of languages in which each language L has a
finite subset T such that for all other languages L,
if T L then L L (Angluin, 1980)
(weaker: no language is a subset of any other)

Why subsets are bad


Red: Target language

Blue: Current hypothesis

If target language is a subset of the current hypothesis,


no positive evidence can definitely rule it out

Formal models of learning


(at the computational level)

Golds theorem
identification in the limit

Probably Approximately Correct learning


more common in modern machine learning

What should we conclude from these models?


do we accept the assumptions?
do we accept the conclusions?

Assumptions in Golds theorem


What constitutes a language
a set of grammatical sentences

What learners see


adverserial learning (enemy makes the data)
only positive examples

What constitutes learning


identification in the limit

Rules of the game


Set of hypotheses
You choose a learning algorithm
chooses a hypothesis from the set for any input

I choose a hypothesis from the set


I choose a distribution of examples D
Error is probability, under D, hypotheses differ
Generate N examples from D (+ve and -ve)
You win if the probability that error is less than
is greater than 1 -

Probably Approximately Correct


There exists finite N such that the probability
that error is less than is greater than 1 -

Given a hypothesis space H, what should N be?

Probably Approximately Correct


There exists finite N such that the probability
that error is less than is greater than 1 -
For a hypothesis space H, of size |H|, satisfied by

1%
1(
N 'log | H | + log *
&
)
Classes of languages in the Chomsky hierarchy
are too large to be PAC learnable (including finite)

Formal models of learning


(at the computational level)

Golds theorem
identification in the limit

Probably Approximately Correct learning


more common in modern machine learning

What should we conclude from these models?


do we accept the assumptions?
do we accept the conclusions?

What should we conclude?

Learning language requires strong constraints on


the set of possible languages.
These constraints are Universal Grammar
(although we havent shown specificity to language)

Are we happy with this?


Rejecting these conclusions requires
rejecting the underlying assumptions

Objectionable assumptions:
What constitutes a language (a set)
Adverserial learning (how texts are generated)
What learners see (a text, samples from D)
What constitutes learning (identification/PAC)

What should we conclude?


What is learnable depends strongly on our
assumptions about language and learning
(well look at other assumptions later on)
Does learnability from infinite amounts of data
help us answer Platos problem?

For Thursday
Read Goodman (1955) on grue

Вам также может понравиться