One general question about intelligent systems is whether they will be dominated by "generative" models which explain data as a sequence of transformations, or by black-box machines that are trained on data at ever greater scale. In perception systems this boils down to the comparative roles of two paradigms: analysis-by-synthesis versus empirical recognisers.
Each approach has its strengths, and empirical recognisers especially have made great strides in performance in the last few years through “deep learning”. Exciting progress has already been made on integrating the two approaches. It is also fascinating to speculate what other new paradigms in learning might transform the speed at which artificial perception can develop.