Learning

Lee Spector

lspector@hampshire.edu

Cognitive Science

Hampshire College

It's all Learned. (Or is it?)

It's all Learning. (Or is it?)

Example paradigms of symbolic learning systems:

- Learning classification rules
- "conjunct dropping"
- version spaces

- Learning by exploration

animal(*x*) & quadruped(*x*) &
tailed(*x*) & barks(*x*) -> dog(*x*)

[flies(*x*) & ~insect(*x*) &
~bat(*x*) & animal(*x*)] v penguin(*x*) v ostrich(*x*)

-> bird(*x*)

Tanimoto's classification rule induction example:

Safe Strings Warning Strings

s_{1}=DECCG w_{1}=CGCGF

s_{2}=AGDEC w_{2}=DGFCD

s_{3}=GCFDC w_{3}=ECGCD

s_{4}=CDFDE

s_{5}=CEGEC

M(*x*, *n*, *c*) is true if in string
*x*, the *n*th character matches *c*.

M(w_{1}, 1, C) & M(w_{1}, 2,
G) & M(w_{1}, 3, C) & M(w_{1}, 4, G) & M(w_{1},
5, F)

Variablize:

M(*x*, 1, C) & M(*x*, 2, G) &
M(*x*, 3, C) & M(*x*, 4, G) & M(*x*, 5, F)

-> warning(*x*)

Add a disjunct for every warning string

[M(*x*, 1, C) & M(*x*, 2, G) &
M(*x*, 3, C) & M(*x*, 4, G) & M(*x*, 5, F)]

v [M(*x*, 1, D) & M(*x*, 2, G) &
M(*x*, 3, F) & M(*x*, 4, C) & M(*x*, 5, D)]

v [M(*x*, 1, E) & M(*x*, 2, C) &
M(*x*, 3, G) & M(*x*, 4, C) & M(*x*, 5, D)]

-> warning(*x*)

Drop conjuncts that have no diagnostic value relative to the training set:

M(*x*, 5, F) v M(*x*, 5, D) -> warning(*x*)

Order matters. Considering the conjuncts from right to left results in:

[M(*x*, 1, C) & M(*x*, 2, G)] v [M(*x*,
1, D) & M(*x*, 2, G)] v M(*x*, 1, E)

-> warning(*x*)

To find the most general rule one might have to
try all *n*! orders, where *n* is the number of atomic propositions.

The combinatorics can be reduced by using domain-specific features to pre-classify subsets of inputs:

Tonic(*x*): true if all notes of *x*
are in {C, E, G}

Symmetric(*x*): true if the motif is a palindrome

Unfinal(*x*): true if the last note of *x*
is in {D, F}

[~tonic(*x*) & ~symmetric(*x*) &
unfinal(*x*)]

v [~tonic(*x*) & ~symmetric(*x*)
& unfinal(*x*)]

v [~tonic(*x*) & ~symmetric(*x*)
& unfinal(*x*)] -> warning(*x*)

Drop conjuncts:

unfinal(*x*) -> warning(*x*)

The "version spaces" method systematically
keeps track of *all* rules consistent with a set of positive and a set of negative
examples.

This can be reasonably efficient because one only needs to store the most specific and most general members.

Learning by Exploration

What does it mean to learn by exploration?

What is discovery?

What are discovery algorithms?

What have you learned lately?

How did you do it?

Piaget's Circular Reaction Paradigm

Phases of the learning cycle:

1. Exercising of actions in repertoire

2. Evaluating the results of those actions with particular attention devoted to detecting the unusual

3. Determining the relationship between the actions exercised and the unusual condition, possibly through repeated attempts to produce the unusual

4. Cataloging the new ability (or concept) as a member of the repertoire

AM [Lenat 1976]

Discovers concepts in elementary mathematics and set theory.

Not oriented toward any particular problem solving goal.

Searches for "interesting" mathematical concepts.

Conjectured, for example, the "fundamental theorem of arithmetic": every number has a unique prime factorization.

Architecture: frame representation system, extensive domain knowledge, production system, best-first search.

Example concept frame from AM

Name: Prime Numbers

Definitions:

origin: Number-of-divisors-of(x) = 2

predicate-calculus:

iterative: (for x > 1): For i from 2 to sqrt(x), ~(i | x)

Examples: 2, 3, 5, 7, 11, 13, 17

boundary: 2, 3

boundary-failures: 0, 1

failures: 12

Generalizations: Nos., Nos. with an even no. of divisors,

Nos. with a prime no. of divisors

Specializations: Odd Primes, Prime Pairs, Prime Uniquely-addables

Conjectures: Unique factorization, Goldbach's conjecture,

Extremes of Number-of-divisors-of

Analogies:

Maximally divisible numbers are converse extremes of

Number-of-divisors-of, Factor a nonsimple group in simple groups

Interest: Conjectures associating Primes with TIMES and with Divisors-of

Worth: 800

Example production rules from AM

If: The current task is "Fill in examples of X"

and X is a specialization of some concept Y,

Then: Apply the definition of X to each of the examples of Y

and retain those that satisfy the definition.

If: The current task is "Fill in examples of X"

and X has a recursive definition,

Then: Instantiate the base step of the recursion to get

a boundary example.

What AM's rules can do:

1. Fill in a slot S of some concept C.

2. Check slot S of concept C.

3. Create new concepts.

4. Add new tasks to the agenda.

5. Modify the interestingness of a task on the agenda.

AM's behavior emerges from the complex interactions of its many domain-specific rules,
but it tends to behave as if it were executing the following loop:

Repeat:

1. Select a concept to evaluate and generate examples of it.

2. Check these examples looking for regularities and:

a) update the assessment of the interestingness of the concept,

b) create new concepts, and

c) create new conjectures.

3. Propagate the knowledge gained to other concepts in the system.

EURISKO is Lenat's successor to AM

AM failed to generalize outside of mathematics.

EURISKO can modify its own rules, using a carefully designed rule-modification language.

The Accretion Model of Theory Formation

1. Given some new (not fully explored) definitions, objects, operations, rules, etc., immediately gather empirical data about them: find examples of them, try to apply them, etc.

2. As this progresses, try to notice regularities, patterns, and exceptions to patterns, in the data.

3. From these observations, form new hypotheses and modify old ones. In a world over which you have some control, design and carry out experiments to test these hypotheses.

4. As a body of conjectures develops, economize by making new definitions that shorten the statement of the most useful conjectures. The entire cyclic process now typically begins anew at step 1, given these new definitions as grist.

5. As the above loop (steps 1-4) proceeds, it will become necessary from time to time to abstract some new soecific heuristics, by compiling the learner's hindsight.

6. On even more rare occasions, it will become necessary to augment or shift the representation in which the domain knowledge is encoded.

7. For all steps in this model, even steps 5, 6, and 7, it suffices to collect and use a body of heuristics, informal judgmental rules which guide the explorer toward the most plausible alternatives and away from the most implausible ones.

Tanimoto presents a simple system called PYTHAGORAS that explores geometrical concepts.

PYTHAGORAS is too simple to generate particularly interesting concepts; about the best it can come up with is (NONZERO-AREA EQUAL-SIDES).

Through the extensions suggested in exercises 17-19 of chapter 10, PYTHAGRORAS can be made much more powerful:

generation and testing of conjectures

generation of unlimited numbers of examples

generation of unlimited numbers of predicates of particular forms

(e.g., (lambda (p) (= (length p) n)) for any n)

use of negations of predicates

use of functions to define new concepts

Scientific Discovery Systems

[Langley, Simon, Bradshaw, and Zytkow 1987]

The BACON series of programs derives quantitative scientific laws from collections of data. It has been used to rediscover the ideal-gas law, Kepler's law, Coulomb's law, Ohm's law, Archimedes' law of displacement, Snell's law of refraction, Black's law of specific heat, and conservation of momentum.

The GLAUBER system derives qualitative scientific laws from collections of data. It has been used to redescover the concepts of acids and alkalis from the data available to 19th century chemists.

The STAHL system builds componential models. It has been used to rediscover the oxygen theory of combustion.

Evaluating Scientific Theories

Thagard's ECHO system evaluates the "explanatory coherence" of a scientific theory [Thagard 1992].

Input: evidence, hypotheses, explanation structure

Output: the most coherent set of explanations for the evidence

Algorithm: build a network from the input and spread activation