0% found this document useful (0 votes)
717 views

Generalization and Network Design Strategies

Redes Neurais, generalização, back-propagation\
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
717 views

Generalization and Network Design Strategies

Redes Neurais, generalização, back-propagation\
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 20
Generalization and Network Design Strategies Y. le Cun Department of Computer Science University of Toronto ‘Technical Report CRG-TR-89-4 June 1989 Send requests to: ‘The CRG technical report secretary Department of Computer Science University of Toronto 10 Kings College Road ‘Toronto MSS 1A4 CANADA INTERNET: [email protected] UUCP: uunettutailcarol BITNET: carol@utorgpu ‘This work has been supported by a grant from the Fyssen foundation, and a grant from the Sloan foundation to Geoffrey Hinton. The author wishes to thank Geoff Hinton, Mike Mozer, Sue Becker and Steve Nowlan for helpful discussions, and John Denker and Larry Jackel for useful comments ‘The Neural Network simulator SN is the result of a collaboration between Leon-Yves Bottou and the author. Y. le Cun’s present address is Room 4G-332, AT&T Bell Laboratories, Crawfords Corner Rd, Holmdel, NJ 07733. Y. Le Cun. Generalization and network design strategies. Technical Report CRG-TR-89-4, University of Toronto Connectionist Research Group, June 1989. a shorter version was published in Pfeifer, Schreter, Fogelman and Steels (eds) ‘Connectionism in perspective’, Elsevier 1989. Generalization and Network Design. Strategies Yana le Cua * Department of Computer Science, University of Toronto Toronto, Ontario, MSS 144. CANADA. Abstract, ‘An interesting property of connectiomst systems is their ability to learn from examples, Although most recent work in the field concentrates on reducing learning times, the most important feature of a learning ma- chine is its generalization performance. It is usually accepted that good generalization performance on real-world problems cannot be achieved unless some a priort knovledge about the task is built into the system. Back-propagation networks provide a way of specifying such knowledge by imposing constraints both on the architecture of the network and on ts weights. In general, such constraints can be considered as particular transformations of the parameter space Bunlding a constrained network for image recogmition appears to be a feasible task. We describe a small handwntten digit recognition problem and show that, even though the problem 1s linearly separable, single layer networks exhibit poor generalization performance. Multilayer constrained networks perform very well on this task when organized in a hierarchical structure with shift invariant feature detectors, ‘These results confirm the idea that minimizing the number of free parameters in the network enhances generalization. 1 Introduction Connectionist architectures have drawn considerable attention in recent years because of their interesting learning abilities Among the numerous learn- ing algorithms that have been proposed for complex connectionist networks, “Present address: Room 4G-352, ATLT Bell Laboratories, Crawfords Comer Ra, Holmdel, NJ 07733, Back-Propagation (BP) 1s probably the most widespread. BP was proposed in (Rumethart et al , 1986), but had been developed before by several independent ‘groups in different contexts and for different purposes (Bryson and Ho, 1969, Werbos, 1974, le Cun, 1985; Parker, 1985; le Cun, 1986) Reference (Bryson and Ho, 1969) was in the framework of optumal control and system identification, and one could argue that the basic 1dea behind BP had been used im optimal control long before its application to machine learning was considered (le Cun, 1088) Two performance measures should be considered when testing a learning algorithm learning speed and generalization performance Generalization is the main property that should be sought, it determines the amount of data needed to traim the system such that a correct response 1s produced when presented fa patterns outside of the training set. We will see that learning speed and generalization are closely related. Although various successful applications of BP have been described in the literature, the conditions in which good generalization performance can be ob- tamed are not understood. Considering BP as a general learning rule that can be used as a black box for a wide variety of problems is, of course, wishful think- ing. Although some moderate sized problems can be solved using unstructured networks, we eannot expect an unstructured network to generalize correctly on every problem. The main point of this paper is to show that good generalization performance can be obtained if some a priort knowledge about the task 1s built into the network. Although in the general case specifying such knowledge may be difficult, st appears feasible on some highly regular tasks such as image and speech recognition, Tailormg the network architecture to the task can be thought of as a way of reducing the size of the space of possible functions that the network can generate, without overly reducing its computational power Theoretical studies (Denker et al, 1987) (Patarnello and Carnevali, 1987) have shown that the likelihood of correct generalization depends on the size of the hypothesis space (total number of networks being considered), the size ofthe solution space (set of networks that give good generalization), and the number of traing examples Ifthe hypothesis space 1s too large and/or the number of traning examples s too small, then there will bea vast number of networks which are consistent with the training data, only a small proportion of whuch will he in the true solution space, s0 poor generalization 1s to be expected Conversely, if good generalization 18, required, when the generality of the architecture 1s mereased, the number of training examples must also be incéeased. Specifically, the required number of examples scales like the logarithm of the number of functions that the network arclutecture can implement

You might also like