“Multilayer feedforward networks are universal approximators” (1989)

https://pdfs.semanticscholar.org/f22f/6972e66bdd2e769fa64b0df0a13063c0c101.pdf

Standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available. In this sense, multilayer feedforward networks are u class of universal approximators.

Minsky and Papert (1969) conclusively showed the simple two-layer perceptron is incapable of usefully representing or approximating functions outside a very narrow and special class.

Any lack of success in applications must arise from inadequate learning, insufficient numbers of hidden units or the lack of a deterministic relationship between input and target.