This blog post describes the integration of deep learning, logic and probabilistic reasoning to enable advanced artificial intelligence tasks. The combination of completely different set of AI approaches will be one of the key advances to support AI driven business processes in the coming years. Furthermore, I describe challenges for operating such complex AI systems leveraging two or more different AI approaches.
The main motivation behind this is that while deep learning networks have been successful in many AI tasks, they have difficulties to incorporate common knowledge. Especially, if this common knowledge represent abstract concepts that are not further detailed in the data used for training, but assumed to be known.
For instance, if you want to teach a deep learning network that if it is raining then the floor is wet, you need to have thousands, ten thousands and more training examples (depending on the setup, sensors etc.). Of course, those need to be of high quality and you need to ensure a constant flow of new training examples to monitor and retrain your deep learning solution. Furthermore, common knowledge may also incorporate future changes based on experience of people that are not visible in the data. For example, if farm land experiences a severe unseen draught than it may be suitable only for completely new type of crop that has not been seen in the data before, but it is known to sustain such draught. Additionally, many deep learning approaches have issues to learn simple things, such as the addition of arbitrary natural numbers. On the other hand logic and probabilistic reasoning are based on human common knowledge and thus can be complex to properly formulate complex non-linear relationships in large datasets more suitable for deep learning. Logic and probabilistic reasoning are related to the domain of AI Planning and Scheduling.
Obviously, this type of common knowledge is known and we should not bother a deep learning network to learn this, but we would like to integrate this kind of common knowledge into the network. It turns out that this is quiet cumbersome and with a lot of side effects if you use deep learning networks only. For example, there is little control how data reflecting common knowledge affects learning specific tasks. Then, common knowledge would require a lot of diverse and very different data, obfuscating the data underlying the real task. Sometimes we may also not have the data or right distribution of data reflecting common knowledge. Humans learn common knowledge in schools, universities or continuous education as set of complex interrelated facts of the world.
Furthermore, deep learning is purely about predicting, but has no integration of prescriptive aspects, ie the automated intelligent decisions for actions based on insights as it is available in AI Planning and Scheduling. Unfortunately, not many people combine different approaches as most of the data scientist have a very narrow focus on deep learning/machine learning and thus are very limited in unleashing the full AI potential. Recent research, however, shows very promising results (e.g. in NLP) on integration of those approaches and every data scientist should learn to leverage this – despite the complexity of this. The intution why the integration can work very well is simple:
- The AI can be decomposed in multiple less complex units addressing different parts of the problem and are composed using logic/probabilistic programming or the other way around where the global part is represented through a deep learning model and local units as stochastic models
- Missing training data can be generated from an existing set of logic/probabilistic rules
- NLP Embeddings/Language Models/Knowledge Graph embeddings or Image embeddings can be learned from logic/probabilistic reasoning between concepts to increase performance of deep learning tasks (see examples here)
- Feature Extraction can be guided by probabilistic/logic rules
- Learning can be steered (e.g. constraint-driven learning)
- Transfer learning: Concepts from one domain can be linked to concepts of another domain using probabilistic/logic rules in a complex and abstract way – as humans do. Thus transfer learning can be facilitated. For example, concepts in one language, e.g. German, can be mapped to concepts in another language, e.g. English
- Integrating cross-media (e.g. audio, video, text, geography) learning data
- The optimisation during the learning phase can be guided by logic/probabilistic rules and thus you can easier reach more optimal solutions
- Examples that are noisy/inconsistent can be dropped and thus improving the model quality
- Active Learning: Determine information that is missing or where there is too little
- Dynamic supervised classification: Classify data even if the class did not exist at training time
- Deep learning can be used to learn logic/probabilistic statements about the world (e.g. from a text or image) and logic/probabilistic reasoning can be used to derive additional information not existing in the data
- Reasoning on deep learning models: We can assure that a deep learning model adheres to certain properties, such as safety (can it be „tricked“?), certainty (how good is the model given an unknown input), non-discriminatory aspects, consistency of the predictions or stability of the predictions over time given influence factors (e.g. trends, evolution etc.). This is especially important in the area of compliance.
There seems to be benefits for forecasting (e.g. here), natural language processing (e.g. here) and image analysis (e.g. here). However, extensive research is needed and has only be partly been done to prove that this is the case. The research can be found under various names, such as deep logic programming, deep probabilistic programming, informed machine learning, neural-symbolic learning, neural association models, or statistical relational learning.
I will describe in this blog post on how this common knowledge can be represented using logic and/or probabilistic reasoning. We will see why a using only logic/probabilistic reasoning is challenging with real world data. Afterwards I will explain how these approaches can be integrated with deep learning networks based on latest research. Then, I will explain the complexity of a system integrating those advanced models and make them available to users. This especially challenging for operating such a system. Finally, I will conclude with some tools related to this. The integration of deep learning approaches with other AI approaches is a trending, but also very complex topic. However, it shows that superior AI applications can be realized by doing so.
Where does common knowledge come from?
Common knowledge and common sense comes from the real world where humans experience certain things or are simply told by their teachers, friends, parents etc. common things about the world. This kind of knowledge might be to a minimal part encoded in text books read in school, but to a lesser degree compared to other information. In fact, text books of schools require that the people reading them have a certain level of common understanding. They may also inherently part of culture of people. Common knowledge is mostly not encoded in a fashion that is accessible to machines – unless the machines would be part of the real world as humanoids that experience the world similarly as humans – with all the consequence.
Common knowledge can be correct, incorrect, probable, inprobable, acurate, inacurate or a losy generalisation – some time or all the time. It simplifies most of our decision making, but can also make it complex. It is not only in form of concrete facts, but also we cannot assign concrete probabilities to facts. Aside the issue that humans have problems dealing with probabilties in their life (e.g. how does your life change given the probability of 4% to die in a car crash?). Humans act irrational and they will continue to do so as the world is uncertain. An AI needs to take this into account – a rational form of AI will lead to artificial behaviour. Nevertheless, for an AI humans may provide common knowledge in a special form that it can be used with other methods (e.g. deep learning) to improve the AI.
One of the most sophisticated approaches nowadays to encode common knowledge into a form that can be processed by an AI solution is Wikidata, which is the foundation of many intelligent applications, such as Google and Bing Search. Those are rooted in the concepts of knowledge graphs. Another interesting aspect of Wikidata is that it is based on a collaborative approach to manage the knowledge base, which is a key success factor to be used in an AI. Users that use the AI will need to contribute to the knowledge base – not data scientists. Other knowledge bases are, for example, DBpedia.
An example
I will describe in this section an example to illustrate the whole approach. Let us assume you have a text and want to extract facts from the text on Wikipedia about famous people. There are plenty of famous people on Wikipedia, so I just pick Grace Hopper as an example here. Let us assume we want to extract facts such as where she worked and lived during her life. Of course, you may want to extract any type of other information or other text sources, such as news articles. However, the same points apply to them described here in this post.
As always with difficult problems for AI, for a human this would be an “easy” task. Of course, it would require some exploration, e.g. the sentence “in 1949, Grace Hopper became employee of the Eckert–Mauchly Computer Corporation”. For human, even if this sentence is read out of context of the article and even if they do not know Grace Hopper, most of them would extract the following facts (see also the Wikidata page of Grace Hopper):
- Grace Hopper is a person (=> humans are emplyoees)
- Grace Hopper is very likely female (=> Grace sounds female)
- Grace Hopper lived in 1949 (=> to be able to work you have to live)
- Grace Hopper worked for the organisation „Eckert-Mauchly Computer Corporation“ (you work usually in an organisation)
- Grace Hopper worked at least at some point in time in IT (due to the name of the organisation), but the exact role is unknown (given the sentence only)
- Grace Hopper is most likely not alive anymore today (2020) or is very old as you need to have a certain age to work for an organisation and the statement is about 1949 where
- Grace Hopper was most likely an adult in 1949 (=> to work in an organisation as a employee you are most likely an adult)
- Grace Hopper might have studied – as this was at this time common to work in IT (depending on the role)
If we would have more information about the organisation or other people working there we could have guessed much more of those aspects. Deep learning networks alone would have problems to capture this common knowledge – it would not matter how much text you would give them to digest. First of all, not all text in the world in current existence cover common knowledge. If you think about IT projects – how much do they rely on knowledge existing in the head of people – that is why successful IT projects distribute knowledge across several heads. Second of all, some common knowledge might be expressed only marginally in text. That means in context of deep learning networks that it is simply noise that might very well be ignored. For instance, the facts that you in most cases need to be an adult to work or that in 1949 in IT jobs, you would have most likely studied, which is again different from the years 2000s. Finally, some texts are also very badly written and they disturb the deep learning network. Humans, on the other hand, can ignore those texts as they do not follow a logical argumentation or they contradict each other.
Nonetheless to say, that logic/probalistic issues would also have issues to address alone facts, such as „might“, „was most likely“, „is most likely“. Logic, for example, works with facts which are either true or false and there is no grey area (* except in Fuzzy logic to some extent). Probabilistic reasoning faces the problem that there cannot be really concrete probabilities assigned. For example, what probability represents „might“ exactly. Is it 60%, 40% or a different percentage – related to what? Is this even relevant for analyising the facts?
Preliminaries: Logical reasoning and Probabilistic inference
Logical reasoning and probablistic inference should be distinguished as two different fields. I will start describing them individually and explain afterwards how they can be merged based on existing recent research. While even the merge of those two fields alone is beneficial, the focus of this article is the deep learning integration. Example for such an integrated approach are Markov logic networks.
Logical Reasoning
Logic reasoning in general is about drawing conclusion from existing facts/logic statements. Logic programming has been introduced in computer science many decades ago and subsequently optimised. It feeds many contemporary AI systems. Basically in the AI context, you can express goals or facts and derive different ways on how to reach them given a situation. This is also known as reasoning. An example would be the following where we can determine if a person is at home given the logic statements in the beginning and the facts in the end.
athome(X) :- human(X), not businesstrip(X).
human(barackobama).
human(michelleobama).
businesstrip(michelleobama).
Athome is a logic statement that combines two facts using a logic operator „AND“ and the logic operator „NOT“.
We can now ask a question or define a goal to determine what is going on. For example, we want to find out who is at home. You simply specify in the logic interpreter the following (try it out here).
:- athome(X).
The logic interpreter now applies a reasoning process and finds all possible solutions to the problem given the statements and facts. In this case, it would find „barackobama“. Of course, you can add/remove dynamically facts and statements to the logic program and reasoning can be started again. In the background usually backtracking on Horn clauses (for reasons of tractability) are used.
This is of course just an illustrative example and real AI use cases have many rules with more complex interdependencies and facts. Many use cases exist in the medical domain to perform intelligent diagnosis, anomaly detection or in the space/aviatrion industry (e.g. to write safe navigation software on which it can be proven that it always works). The advantage is that an AI does not need to learn something which is common knowledge or deal with cases where there is little training data. One provides the common knowledge and an AI can reason on it to fullfil all tasks according to an objective or to answer questions.
It can be applied to (nearly) any type of problem. Furthermore, with these kind of approaches it is possible to detect hidden knowledge from the existing set of logic/probablistic rules by means of reasoning and interference, which is very difficult to gather from, for instance, deep learning networks. Those facts can be easily found out by common logic programs such as Prolog.
One issue is though that logic statements in real world examples are usually conflicting, ie based on real world facts there is no solution to a logic program despite in the real world there must be one.
Probabilistic inference and stochastic processes
Probabilistic inference formulates fact as a set of random variables. Contrary to logic reasoning there are no certain facts and all facts are associated with some uncertainty. This should be not confused with fuzzy logic which deals with partial truth values that cannot be directly translated to probabilities. You can, for example, interpret fuzzy logic as vagueness.
The facts (random variables) are related to each other using Bayesian inference, which is based on the Bayes theorem.
$latex \Pr(A|B)=\frac{\Pr(B|A)\Pr(A)}{\Pr(B|A)\Pr(A)+\Pr(B|\neg A)\Pr(\neg A)}$
The inference basically gives the probability of an event A given another event B. The likelihood of those events may be given using more or less complex probability distributions. Of course, those events can be nearly arbitrarily chained or run independent of each other. This rule paved the way for a wide range of stochastic models, such as
An example, for using probabilistic reasoning is the following. Let us assume the probability to catch a disease is P(Disease) = 0.5. Now, let us assume a new information is known, e.g. a person has problematic preconditions. Then, the probability to catch a disease is higher P(Disease|Problematic preconditions) = 0.8. Given, now that a person has a disease, one can calculate how probable it is that the person has problematic preconditions. This method also allows to model sequences of events (e.g. stochastic processes).
Given the logic example above, one may think that logic programs and Bayes theorem have some similarities. However, from a practical stand point they are not easily interchangeable. It depends on your use case. AI programs, especially those that are prescriptive, e.g. given a goal should find the set of tasks to reach that goal, are very difficult to design using a probabilistic approach. Especially integrating „common“ sense knowledge leads very often to unpredictable but probable behaviour, which makes them less general. It may be also about „practical“ considerations – what is intractable in one approach might be tractable in the other one.
Then, for an AI to act, probabilistic reasoning does not work very well. While one could choose the most probable outcome, it is very difficult, for instance to decide between outcomes of similar probability, e.g. 40% and 41%. Even more difficult is the case when all outcomes have a very low probability (e.g. due to the sheer amount of events). So while logic reasoning is good for determining the next actions to do in an intelligent way, probabilistic reasoning is more suitable to predict several decision outcomes in case of uncertainty.
Combining logic and probabilistic reasoning
Given the limitations of logic and probabilistic reasoning before, there have been attempts to integrate both. Basically, the motivation was that logical reasoning was extremely useful to reliable reason on as set of action, probabilistic reasoning allows to introduce uncertainty of information into the process. However, you do not want to express everything as uncertainty as this might lead eventually to very low probabilities on which you cannot draw any useful conclusions. There are different ways to do an integration and I state here only one of the most common (Baysian logic). Find here also an interactive tutorial to experiment yourself. Basically one describe facts with certain probabilties and use logical reasoning rules to determine their probability. We illustrate a very similar example as above, but integrating logic and probalistic rules (try it out yourself). Especially we add uncertainties if a person is at home or at a business trip.
athome(X) :- human(X), not businesstrip(X).
1.0::human(barackobama).
1.0::human(michelleobama).
0.5::businesstrip(michelleobama).
We can now query the probability that a certain person is at home
query(athome(X)).
And we get as an outcome certain probabilities
Of course, this a simple example, but it shows how powerful it is especially when you add more complex inference and logic statements.
Limitations of logic and probabilistic reasoning
While logic and probabilistic reasoning sounds like an excellent choice for AI applications and they obviously combine the best of both worlds, they have drawbacks:
- Both can easily become intractable – ie it takes prohibitive long time to calculate them and there will be most likely also never a hardware device that relaxes the situation (even not a quantum computer may help here). Hence, in reality the AI engineer is limited to Horn clauses or other type of tractable clauses in case of logic reasoning. For probabilistic inference one is limited to (stochastic) variational inference and similar techniques.
- It may lead to definition of undecidable problems although in practice a decision is needed and somehow also made by humans
- Both have problems scaling to very large datasets due to inherent computational complexity. The acceleration via special hardware is at the moment very limited and potentially also not possible.
- Textual facts, images and to a lesser extent number facts cannot be easily represented. For example, in text a common problem is that the same concept can be described in many different ways (think about company names or the activities of a company). This means for integration into logic/statistical reasoning software you need to harmonise the texts (difficult) or write many different special rules for all possibilities in the text (complex and limits reasoning).
- It can be very complex to define probabilities. For instance, dertermining the probability in our example that Michele Obama is on a business trip. One may for instance have a deep learning algorithm that goes through her emails and determine that she is on a business trip with a certain probability. However, this means deep learning needs to be integrated.
- A complex set of statements is very difficult to maintain and analyze by human beings. On real world data a lot of exception have to be encoded (think about many different ways on how to write a company name a place etc.). However, ontologies such as Wikidata or Dbpedia are crowdsourced by millions of people and show that a collaborative approach can help here.
- It is difficult to deal with conflicting information, especially in logic reasoning. This is a bit relaxed with probabilistic reasoning, but may lead to strange side effects.
- Logic-only statements are very strict ones – either they apply or do not apply always the same. Conflicting facts are difficult to resolve. This is a bit relaxed by probablistic reasoning. However, sometimes it can be also an advantage for an AI application as the resulting error from this is negligible compared to the benefit that reasoning brings.
- Requires a lot of expertise to design and handle as two complex AI approaches are integrated.
- GPUs and similar hardware acceleration for deep learning models cannot be applied or only applied to a limited extent
Peliminaries: Deep Learning
Most of the concepts underlying deep learning exist since decades. Some of those are rediscovered as they got lost in AI winters. Others are more recent research advances of those models. Since 10-15 years deep learning can be applied practically to larger data volumes and thus they became usable to a wider mass and use cases. All deep learning approaches have in common that they heavily leverage complex artificial neural networks consisting of several layers and different types of neurons as illustrated in the following figure. There, different types of neurons are characterized by different symbols.
Deep learning has advantages:
- Can process large scale unstructured data, such as image, audio, video or text data
- Very suitable for large range of tasks, ranging from speech recognition, drug discovery over cancer detection based on images to financial fraud detection
- Can be accelerated by hardware to deliver fast results
Deep Learning suffers though from several issues:
- A lot of data is usually needed, especially to encode common “world” knowledge. This specific knowledge might not be encoded in any text or any storage accessible. For example, when developing and operating a complex piece of software then there is still some common knowledge just in the head of people or in “noisy” channels such as email inboxes or document management systems. You cannot simply dump all the data you have in a deep learning model and it will figure itself out the right things. A high data quality, feature extraction and selection process is needed otherwise your model will not work.
- Deep learning models are subject to various security attacks, for instance, you can trick common deep learning models easily by adding hidden filters that make out of a stop sign a speed sign or out of a mosquito an elephant when using common object recognition deep learning networks.
- Deep learning models are not perfect – independent on what accuracy score they have when they are developed. During production you will face a lot of mis-predictions and it is crucial to have a process addressing them when they occur. This cannot always be solved by simple retraining as this will have other side effects, such as changes of existing classifications.
- Many deep learning models are “niche” models – they fit only to a certain type of tasks or few tasks. This limits their applicability.
- Deep learning networks have issues with or possibly cannot at all learn causal relations (events) or mathematical operations, such as sum of two natural numbers.
- Deep learning networks are energy hungry and thus not very sustainable.
- Deep learning models cannot determine a relevant set of future actions to be triggered based on a common goal.
Deep Probabilistic Logic Reasoning
Given the advantages of deep learning and also its disadvantages – what happens if we integrate deep learning and probabilistic logic reasoning? Let us find out in the following paragraphs.
There are several approaches to integrate logic reasoning, probabilistic inference and deep learnin as I have explained in the beginning. I illustrate here one of them: DeepProbLog. There are also similar other popular frameworks (e.g. Pyro or Edwardlib) as I will explain below.
Basically DeepProbLog is based on the problog approach before, but extended with deep learning. Basically, all the probabilities that you have seen in the previous approach can be replaced by a deep learning network returning those probabilties. For example, the probability that Michele Obama is at home can be determined by a deep learning network analysing her emails and trip information. Then, further common knowledge can be used as described above to determine if she is at home or not.
Another example use case from the DeepProblog paper is that one can train a deep learning network to recognize numbers from an image and use probablistic logic statements to perform mathematical operations on top of them. This is one case which would not have been possible with each of the approaches individually. Here the example from the paper
Basically the digits 3 and 5 are images and the numbers are recognized through a simple deep learning network which provides the probability that they are those numbers. Addition is a logic statement that defined the result of the addition. The authors claim that this works also for additions which the deep learning network has not trained, ie with different combination of numbers and arbitrarily chaining several numbers.
There are though also disadvantages, such as the significant increase of complexity to model complex AI tasks and that integrating different deep learning approaches may imply compute-intensive processing. Furthermore, the problem of tractability can only be circumvented to a limited extent.
As mentioned in the beginning: Representing probabilities as deep learning models is just one way of integration. There are other ways, for example, integrating probalistic or logic statements into the training process, creating cause/effect relationships (e.g. Neural Association Model or neuro-symbolic reasoning) or using it to preprocess as well as enhance training data. Those work fundamentally different in some cases, but are based on the same idea of integration in mind. We will illustrate this also in the next section based on architectures for systems leveraging deep probabilistic logic models. We will also see that the modeling aspect has certain implications on how to operate such complex models.
Deep Probabilistic Programing
Architecture
Designing AI applications using the approaches presented here cannot be solved by learning a model or two. They require complex architectures and different software engineering paradigms. Later we will only mention some libraries, but the reader should keep in mind that a lot of more work needs to be done by a team of AI engineers to make this happen for real world use cases. The following architecture illustrates ths complexity as well as the different actors needed to operate such a system. Each actor usually represents a group of people that need to collaborate with other people.
The architecture basically involves
Data
- Batch data arrives usually in files or is in large databases
- Event data arrives as continuous streams of events that are analysed over time periods („windows“)
Managing the data for complex AI solutions is crucial, but in itself a complex problem as well.
AI artefacts
All AI artefacts can be added/edited/removed by humans and also (intelligent) applications over time, which creates another long-term dynamic in the AI applications.
- Models are any trained deep learning models
- Logic statements are logic statements and possible derived statements using reasoning approaches
- Probablisitic statements are probablistic statements and methods
Compared to only using deep learning, more AI artefacts have to be provided by the AI Operator and constantly updated. Furthermore, the complex interrelation of logic and probabilistic statements provided by different people can lead to significant challenges to operate the solution in a reliable manner.
Processing
Integration of all AI artefacts can happen at all stages of processing in different ways. Key importance is that high quality data enters the system and this is ensured by the AI data quality assurance role. Examples:
- Preprocessing uses the data and also AI artefacts to do the processing. For example, logic statements can be used for detecting inconsistencies in data and correcting it or discarding it.
- Training uses the data and all AI artefacts to steer the training in the right direction. For instance, probabilistic statements can be used to point to very likely aspects. For instance, if a picture shows a cloudy weather then it is unlikely that the temperatures reach beyond 20° Celcius. This reduces significantly the examples needed and can ensure to some extent that training does not drift in the wrong direction.
- Prediction uses data and all AI artefacts to make more informed conclusions of the outcome of one or more deep learning networks. For example, several objects in a picture are used as an input to conclude what is happening. E.g. several players in a jersey and a soccer ball can be detected, so it is unlikely that we see a tennis match
- Prescription uses data and all AI artefacts to derive a set of automated actions to be executed. For instance, if an automated car detects a person on a street then the objective is to avoid crashing into this person. Depending on several other conditions the AI may use logical reasoning for the best course of action (e.g. breaking, moving out of the way etc.).
AI Usage
AI can be used in different ways:
In many cases, it is used for advanced decision making by an analyst. Here the key is that the information processed by an AI is transparent and understood by the decision maker.
In other cases, it automates certain tasks in the background for a user. For example, based on a document uploaded by the user the system analyses the content of the document and triggers the necessary actions, e.g. import invoice data into a SAP system. Here, the AI operator plays a key role that monitors that the AI behaves as expected and takes counter-actions if not.
Tools/Systems
There are a lot of libraries available for deep learning as well as on probabilistic/logic reasoning. The latter ones – with some exceptions – are mostly used in the academic domain. There are few libraries/software tools for deep logic probablistic reasoning:
- Pyro.ai: Deep Universal Probabilistic Language with many examples of integrating probabilistic programing with deep learning. It is based on Pytorch.
- Edwardlib.org: Probabilistic modeling, inference and criticism with deep learning. It is based on Tensorflow/Keras.
- DeepProbLog: Probabilistic and logic statements (ProbLog) are integrated and predicates are represented by deep learning networks. It is based on Problog and Pytorch.
- Turing.jl: A probabilistic programing library for Julia
- TensorLog: Integrate deep learning with probabilistic databases. It is based on Tensorflow.
It should be noted that those libraries alone are not sufficient to deploy such a complex AI system as described above. You will need Big Data platforms, dashboards and fully automated software that works without human interaction to execute actions based on insights.
Impact on the AI lifecycle
The AI lifecycle is somewhat integrated with the software engineering lifecycle. Nevertheless, there are differences. While the software engineering lifecycle is more predictable, the AI lifecycle is less predictable and models need to change given the data. Additionally, given probabilistic and logic statements, another dimension of complexity is introduced here. They largely depend on the context: the common knowledge that is reflected in complex knowledge bases. There, it is much less evident compared to deep learning what impact a change has now and in the future. Even subtle changes to a knowledge base could have larger influence on how the AI behaves in the given scenarios. Furthermore, the complexity of those requires usually involvement of a lot of people/experts from different domains (depending on your knowledge base). This can introduce coordination and collaboration problems, especially if responsibilities and domains are not clarified.
What is next?
Integrating deep learning with logic/probabilistic reasoning can bring a lot of benefits. However, as Box said in his famous article on „Science and Statistics“: „All models are wrong, but some are useful“.
I want to point out here that we increase the complexity of the models significantly and it is easy for the data scientist to be tricked by her/his own model that it is good on paper, but not in practice working with messy data. This requires a good control process embedded in an application around the AI in form of deep logic/probabilistic models.
Furthermore, the usage of a complex model should be economcial. This means it can easily be reused in a larger context and does not requires extensive hours of experimentation and tuning – this can easily happen with the models introduced here. One key challenge is thus to design the deep logic/probilistic program in a way that this does not happen. Nevertheless, deep logic/probabilistic programming is probably also one solution to this issue. Although more research is needed and especially applications need to be evaluated.
Additionally, as we have seen architecture and the business processes around them are key to their successful application. Once in production they need to work autonomously and adapt to changing situations in an automated fashion – it cannot be retrained for every incident that occurs.
Last but not least: Advanced AI will require knowledge bases of your organisation. Start creating an organisation-internal Wikidata with your employees now – it will take a lot of time as well as it will grow in maturity once you execute your AI applications over them.
I expect several aspects happening in the domain mentioned in this article:
- Designing and implementation of production-ready use cases – this is one of the core limitations of artificial intelligence – current research does not take into account real cases, but academic ones
- Architecting applications based on logic/probabilistic reasoning with deep learning – this requires adequate user interaction points, special databases for logic rules, knowledge bases and probabilistic inference, large scale object stores for deep learning data and a cybernetic design for contiously feeding the AI with statements, believes and data to optimise itself
- Finding adequate benchmarks/perfomance indicators for deep logic/probabilistic programing. This is a difficult one as those indicators need to also include aspects of the quality of the logic/probablistic statements provided by humans. Furthermore, the NLP/ML space is „polluted“ with various indicators and daily new ones are invented. However, a sophisticated comparison of those indicators for real world applications is still missing (see also here)
- AI-driven organisations will develop very sophisticated collaborative knowledge bases similar to Wikidata integrating with other knowledge bases of other organisations using Linked Data
- Hardware acceleration, for instance, with GPU is limited to the deep learning part and do not help a lot with probalistic/logic reasoning. Here the path would not be to use different hardware accelerators but changing the method of using probabilistic/logistic reasoning. Large knowledge bases are another issue in this context
- Integration with natural logic concepts to develop new capabilities in NLP for continuous life-long learning
- Tooling/frameworks will make it easier to apply deep logic/probabilistic programing with automated guides highlighting issues with the model. Those guides are also themselves based on deep logic/probabilistic programing
- Increase the potential of natural language processing applications. Deep logic probabilistic programming can play, for example, a key role in case of classification tasks, (double) negations, question/answering/expert systems, irony or fake news detection
- Increase the potential of image analysis
- Increase the potential of forecasting
- From predictive to prescriptive analytics: Automatically perform the relevant next step in a complex AI-driven business process based on objectives/goals set by humans
- Deep Logic/Probalistic Profilers: Similar to software profilers it identifies computational/memory bottlenecks in models and ideally suggest improvements automatically
- Ensure that a model fulfills certain security properties or is not discriminatory
- Include multi-modal data (e.g. images/videos with GPS tags)
- Integration of deep probablistic logic approaches with geospatial analysis – as a huge amount of data we have is of geospatial nature
- Leveraging AutoML and automated knowledge discovery for simple model development and operations
- New ways of doing AI: If one faces a problem – go back to the original data to similar problems and strengthen the neural network with the original data. This is similar, for example, if you forgot something, just read the article again
- New ways of doing AI: AI needs to be rewarded for collecting relevant information for a task at hand, but punished for irrelvant information – this can be also done by removing logic and probabilistic statements
Schreibe einen Kommentar