Thursday, 19 December 2019

How can we teach machines to perform better?

In my previous article, I provided examples that illustrate how far is AI (practically machine learning) from what some like to call general intelligence. I illustrated a gap between what is called deep learning for machines and for people. I also provided examples for forms of knowledge that at this point appear unthinkable for AI to reach, in some cases even being difficult to define. In this article, I explain how to programmatically address one particular category of knowledge that is difficult for machines to tackle - what is popularly known as know-how.

To this end, let's take a step back and consider one simple categorisation of knowledge that captures well the aspects that are challenging for machine learning. The knowledge categorisation in questions is the one introduced by David Perkins in his article titled “Beyond Understanding” (2008). Although Perkins limits himself to introducing just three categories of knowledge, he manages to find extremely powerful wording: possessive, performative and proactive knowledge. Here possessive knowledge is the one that answers know-that questions, performative knowledge - know-how questions, and proactive knowledge answers know-where-and-when questions. Possibly, proactive knowledge also encompasses know-why. With some generalisation which is admittedly going too far, we can say that machines are already outperforming people in possessive knowledge due to digital memory which allows for practically unlimited storage. However, even when this is combined with the current impressive growth of computational power, it leaves a lot to ask when it comes to performative knowledge, and despite lots of ambition, we’ve barely scratched the surface of machines successfully engaging with proactive knowledge.

So, let’s consider some examples that could give us ideas to where we stand with performative knowledge. Deep machine learning is not just smarter number crunching, but is becoming more aware of the relevant context and inherent structure of the application domains. However, it is already becoming evident that it is no longer enough to test algorithm’s ability via standard datasets and metrics, such as ImageNet, BLEU or FaceForensics (Thomas 2019). Much like standardised tests for humans, these predispose to superficially memorising the test answers, rather than learning to reason about the actual subject. Measuring machine learning would need to grow into learning assessment. To this end, we would need ways to better understand how machine learning algorithms reason through what has become known as explainable AI. This is what would allow us to engage with algorithms in what in organisational learning is commonly called double-loop learning (Argyris 1991). It is a discussion of meaning and relating it to other ideas, to enable algorithms improve machine learning models themselves, not just their outcomes. Such an approach is not unlike hierarchical learning that is currently used, but it would engage into interactions that are aimed at refining the used hierarchies. Possibly a different take on the same, we would need to look at the interplay of questions for relevant insights, as opposed to searching for individual optimal answers that might lead towards suboptimal solutions. To this end, we might look into borrowing techniques from analytics and think how we could compose their results in providing more nuanced and elaborate answers that lead to more insightful findings. Although implicitly this might be the direction in which deep learning is already evolving with the birth of more sophisticated learning architectures, individual design choices would need to be defended in detail and rooted much deeper into particular contextual evidence.

One particular approach that takes advantage of contextual information comes from traditional industrial management - process mining is designed to address performative knowledge. To do this, models are built around the temporal structure of the information provided, i.e. around processes. Process mining is a broad term for a range of techniques that use event logs to discover and analyse business processes. In practice, this approach is applicable to any collections of events. Process mining relies on structured data composed of timestamped combinations of activities and case identifiers (think e.g. of names, reference identifiers or tracking numbers). Process discovery models the path of each case throughout the provided activities. This approach of including temporal assumptions arising from the underlying process allows for a broad range of interpretations. Clearly among these is the ability to answer a variety of performative - or if one still prefers, know-how - questions. I consider some examples for that below. The interpretative potential of a generic process mining dataset hugely expands when additional information about events is provided - like duration, performing actor, valuation, location, or any particular expectations. Other possible variations of process mining could be to work with data that provides less than the generic assumptions required by process mining. For example, instead of specific timestamps only the generic ordering of events could be analysed when precision of collected data is questionable. Activities might be unlabeled and their identity could be derived as a composed signature of other attributes they might have. Identifiers could be partially or completely omitted, for example when working with energy or financial flows.

For some examples, consider crime prevention, the application domain that I mentioned in my article on deep learning and types of knowledge. Various machine learning techniques are already used for surveillance (Moody 2019), crime mapping (Greene 2019) or fraud prevention. However, arguably, widespread approaches are limited to the identification of patterns that could hardly be considered transferable. This is exactly because there are limits to the depth in which behaviour could be captured without building on the structural properties inherent from the sequential character of processes. With process mining, both personal and situational circumstances of public nuisance can be captured by the model. One example for a way to address the personal aspects of crime, consider offender reintegration. Educational process mining (Bogarin, Cerezo & Romero 2017) can be used to identify patterns of support for ex-offenders. This would allow identifying important reintegration activities whose completion is indicative for successful reintegration. It would also inform better reintegration paths that have been really walked through by others, thus exhibiting role models, both between ex-offenders and social servants involved. This is one possible approach to looking into answering the know-who to involve question. As for the situational circumstances, the Italian traffic police (Leuzzi, Del Signore & Ferranti 2017) has identified a number of applications of process mining - such as conformance verification, traffic forensics - where the answers to know-where and know-when questions could provide invaluable insights that are lacking in current approaches. All this potential is yet to be tapped. And we've arrived at the point to do so. A range of open source and commercial process mining tools have reached the a level of maturity that allows their wide use in several production and service sectors. At this point, the only way to reach the limits of what can be achieved with process mining, is to try and apply it to new domains.

* The author of this blog is currently working at myInvenio Srl, a company whose flagship product is a process mining suite, combining applications for automatic process discovery, simulation and analytics.

No comments:

Post a Comment