Monday 16 December 2019

In how far is deep learning really deep?

There’s a sense of history repeating among those following AI news. Time and again we are told that computers will reach the level of human intelligence, just to have the myth bust shortly thereafter. Take the latest hype in deep machine learning. We were told that machines would take over the world. The glamourous visionary Elon Musk literally told us that “AI is our biggest existential threat” (Gibbs 2014). Yet we’ve just scratched the surface of a new generation of challenges. Think of the seemingly trivial tasks like parking that stumble self-driving cars (Marshall 2019), or the huge crowds working behind supposedly automatic text processing algorithms (Dickson 2018), or the fancy fashion that turns out to be enough to mislead mass surveillance (Thys, Ranst & Goedemé 2019). We’ve found that it is not enough for computers to outperform humans in games - like they did in sophisticated ones like chess, go or even StarCraft - for this to have any significant impact on how artificial intelligence compares to the one of humans.

In order to understand this comparison better, one useful perspective could be looking into how intelligence is nurtured. This is surprisingly similar for both AI and human intelligence, and - quite obviously - it is through learning. Yet, under the surface there are huge differences between how we perceive learning in both cases. And this is a gap that is definitely worth understanding better if we are to talk about what possibilities we have of closing it.

Let’s look closer at deep learning. It is named after the multitude of its hidden layers of computation that aim to automatically discover derived complex features of the learned data. We are hearing literary daily of amazing new developments on this front. Yet, there’s an interesting coincidence that is not much spoken about: the concept of deep learning also exists in education - the traditional one, involving people - and was studied by John Biggs (1987), among others. It has been characterised by learners engaging in “seeking meaning, relating ideas, using evidence, and having an interest in ideas” and contrasted to surface learning where learners are engaging in “unrelated memorizing of information, confining learning of the subject to the syllabus, and a fear of failing”. Put in this perspective, it could be questioned to what extent machine deep learning is actually seeking meaning, rather than applying numerical algorithms on what it gets as memorised information. What is more clear, is that for the neural networks used in machine learning, relating ideas is something that is scoped by the assumptions and design choices of the data scientists, and limited by the architecture of the neural network. One could argue that such scope definition corresponds very closely to what a student confining themselves to a syllabus could be.

Consider how evidence is used. A prudent human learner would actively seek out the right evidence that would shed more light on one’s own uncertainties. This relates to active machine learning, a collective term for approaches that ask users for help in addressing data points that machine learning algorithms cannot classify on their own. This is what in machine learning is called semi-supervised learning. Yet, it would be a stretch to claim that algorithms endeavour to search for evidence that might be contradictory, or that they could reasonably decide when a paradigm shift would be appropriate, beyond a measure of better performance against an optimisation function. It is beyond their design to make a decision when the optimisation function they use doesn't adequately capture all the relevant features of the problem space.

To see where deep machine learning stands in the final contrast between interest in ideas and fear of failing, we could try to understand somewhat better these concepts. Let’s turn to the work of a development psychologist, Carol Dweck. In her flagship work on mindset, she identifies two distinct approaches to learning among people that she calls a growth mindset and a fixed mindset (Dweck 2017). Dweck describes growth mindset as “stretching yourself to learn something new” which - one could suggest - is a different way to describe someone interested in ideas, in particular new ones. In contrast, she describes the fixed mindset as defining success to be about “proving you’re smart or talented”. Such a predisposition is a natural prerequisite for fear of failing itself, as failure would be perceived as an indication of the contrary of what is sought to be proven. For a machine learning algorithm, proving itself in mathematical terms is performing better than the baseline with respect to a given optimisation function. The growth mindset is about expanding the limits of one’s knowledge and develop the questions being asked. In formal terms this arguably corresponds to working to improve optimisation function (i.e. the goal) itself. This seems to lead towards one particular shortcoming of machine learning: despite the existence of techniques to escape mathematical local optima, contemporary deep learning algorithms are focused on finding a better (as in global and not local) optima of a given quantified and fixed goal. This doesn’t engage with the possibility to refine, or even evolve the questions being asked, when evidence is accumulated that the current formulation is missing critical factors, which in turn might be distorting the result.

Going back to the distinction between seeking meaning and unrelated memorising of information, one might ask oneself what are the different types of knowledge that one could develop. Turning back to the writings of John Biggs, in an article titled “Modes of learning, forms of knowing, and ways of schooling” (2005) he noted the existence of a multitude of forms of knowledge. Beyond the widely discussed know-that (declarative knowledge) and know-how (procedural knowledge) he considers a range of others. Some of these could be seen as variations of know-that, such as theoretical and metatheoretical - the latter relating to state of the art research, where what is known might change. Others, such as tacit and intuitive knowledge are raising the question whether machines could learn what is not explicitly given in the training data. Finally, there is a category which is widely referred to as conditional knowledge. Biggs refers to it as knowing-how-and-why. However, in another example from a specific - and admittedly very complex - domain, crime prevention, a much more nuanced picture emerges for this category. In (Ekblom 2001), while discussing the challenges of implementation of crime prevention and the reasons behind failure, Ekblom identifies knowledge categories like know-about crime problems, know-who to involve, know-when to act, know-where to dedicate resources, know-why, know-how to put in practice. These last question-based knowledge categories should serve us as a hint of how difficult it is to learn and know, be it for a person or a machine.

Yet, I’m in no way implying that the difficulty of defining the scope of knowledge should stop us from trying to develop better machine learning and expand our own limited horizons. On the contrary, instead of making unfounded claims about the fantastic future of artificial intelligence, I find it more valuable to seek to identify pending real-life problems that could be realistically solvable with the current state of the art in machine learning. In the process of solving them, we can further push the boundaries of what we know and what we can do.

I see this article as a conversation opener for a range of different topics. One of them, how can machine learning address know-how knowledge with the help of process mining.

No comments:

Post a Comment