Hybrid Systems
Hybridization
• Integrated architectures for machine learning have been shown to provide performance
improvements over single representation architectures.
• Integration, or hybridization, is achieved using a spectrum of module or component architectures ranging from those sharing independently
functioning components to architectures in which different components are combined in inherently inseparable ways.
• In this presentation we briefly survey prototypical integrated architectures
The combination of knowledge based systems, The combination of knowledge based systems, neural networks and evolutionary computation neural networks and evolutionary computation
forms the core of an emerging approach to forms the core of an emerging approach to
building hybrid intelligent systems capable of building hybrid intelligent systems capable of
reasoning and learning in an uncertain and reasoning and learning in an uncertain and
imprecise environment.
imprecise environment.
Combinations
Combinations
Current Progress
• In recent years multiple module integrated
machine learning systems have been developed to overcome the limitations inherent in single
component systems.
• Integrations of neural networks (NN), fuzzy logic (FL) and global optimization algorithms have
received considerable attention [Abr] but
increasing attention is being paid to integrations with case based reasoning (CBR) and rule
induction (RI) [Mar, Pren].
Primary Components
• The full spectrum of knowledge representation in such systems is not confined to the primary
components.
• For example, in CBR systems although much knowledge resides in the case library significant problem solving knowledge may reside in
secondary technologies such as in the similarity metric used to retrieve problem solution pairs from the case library, in the adaptation
mechanisms used to improve an approximate solution and in the case library maintenance mechanisms.
MultiComponents
• Although it is possible to generalize about the relative utilities of these component types based on the primary knowledge representation
mechanisms these generalizations may no longer remain valid in particular cases
depending on the characteristics of the secondary mechanisms employed.
• Table 1 attempts to gauge the relative utilities of single components systems based on the
primary knowledge representation.
Degree of Integration
• Besides differing in the types of component systems employed, different integrated architectures have emerged in a rather ad hoc way, Abraham [Abr].
• Least integrated architectures consisting of independent components communicating with each other on a side by side basis.
• More integration is shown in transformational or
hierarchial systems in which one technique may be used for development and another for delivery or one
component may be used to optimize the performance of another component.
• More fully integrated architectures combine different effects to produce a balanced overall computational model.
Transformational,
hierarchial and integrated
• Abraham categorizes such systems as
transformational, hierarchial and integrated. In a transformational integrated system the system may use one type of component to produce
another which is the functional system.
• For example, a rule based system may be used to set the initial conditions for a neural network solution to a problem.
• Thus, to create a modern intelligent system it may be necessary to make a choice of
complementary techniques.
Stand Alone Models
• Independent components that do not interact
• Solving problems that have naturally
independent components – eg., decision
support and categorization
Transformational
• Expert systems with neural networks
• Knowledge from the ES is used to set the
initial conditions and training set of the NN
Hierarchial Hybrid
• An ANN uses a GA to optimize its
topology and the output fed into an ES which creates the desired output or
explanation
Integrated – Fused Architectures
• Combine different techniques in one computational model
• Share data structures and knowledge representations
• Extended range of capabilities – e.g., classification with explanation, or,
adaptation with classification
Generalized Fused Framework
Fused Architecture
System Types for Hybridization
• Knowledge-based Systems and if-then rules
• CBR Systems
• Evolutionary Intelligence and Genetic algorithms
• Artificial Neural Networks and Learning
• Fuzzy Systems
• PSO Systems
Knowledge in Intelligent Systems
• In rule induction systems knowledge is represented
explicitly by if-then rules that are obtained from example sets.
• In neural networks knowledge is captures in synaptic weights in systems of neurons that capture
categorizations in data sets.
• In evolutionary systems knowledge is captured in
evolving pools of selected genes and in heuristics for selection of more adapted chromosomes.
• In case based systems knowledge is primarily stored in the form of case histories that represent previously
developed problem-solution pairs.
• In PSO systems the knowledge is stored in the prticle swarms
CBR KB NN GA FL
Know. rep. 3 4 1 2 4
Uncertainty 1 1 4 4 4
Approximation (noisy
incomplete data) 1 1 4 4 4
Adaptable 4 2 4 4 2
Learnable 3 1 4 4 2
Interpretable 3 4 1 2 4
Table 1 (Adapted from [Abr, Jac] and [Neg]). A comparison of the utility of case based reasoning systems (CBR), rule induction systems (RI), neural networks (NN) genetic algorithms (GA) and fuzzy systems (FS), with 1 representing low and 4 representing a high utility.
Interpretability
• Synaptic weights in trained neural networks are not easy to interpret with particular difficulties if interpretations are required.
• Genetic algorithms model natural genetic
adaptation to changing environments and thus are inherently adaptable and learn well
• Not easily interpretable because although the knowledge resides partly in the selection
mechanism it is in the most part deeply
embedded within a population of adapted genes.
Adaptability
• Case based systems are adaptable
because changing the case library may be sufficient to port a system to a related
area. If changes need to be made to the similarity metric or the adaptation
mechanism or if the case structure needs
to be changed much more work may be
required.
Learnability
• Fuzzy rule based systems offer more
option through which learnability may be more easily achieved.
• Fuzzy rules may be fine tuned by
adjusting the shapes of the fuzzy sets
according to user feedback [Abi]
Rules and cases
• Rule based systems employ an easily
comprehensible but rigid representation of expert knowledge such systems may afford better interpretation mechanisms.
• Similarly recent research shows [SØR] that explanation techniques for large case bases is most promising while case based learning and maintenance can often be very efficient because of the transparency of typical case libraries.
Example Example
Neural Expert Systems
Neural Expert Systems
Basic structure of a neural expert system Basic structure of a neural expert system
Inference Engine
Neural Knowledge Base Rule Extraction
Explanation Facilities
User Interface
User
Rule: IF - THEN Training Data
New Data
Can we combine advantages of ANNs Can we combine advantages of ANNs
with other IS systems to create more with other IS systems to create more
powerful and effective systems?
powerful and effective systems?
Neural expert systems Neural expert systems
Expert systems rely on logical inferences and Expert systems rely on logical inferences and decision trees and focus on modelling human decision trees and focus on modelling human
reasoning. Neural networks rely on parallel data reasoning. Neural networks rely on parallel data
processing and focus on modelling a human brain.
processing and focus on modelling a human brain.
Expert systems treat the brain as a black-box. Expert systems treat the brain as a black-box.
Neural networks look at its structure and functions, Neural networks look at its structure and functions,
particularly at its ability to learn.
particularly at its ability to learn.
Knowledge in a rule-based expert system is Knowledge in a rule-based expert system is represented by IF-THEN production rules.
represented by IF-THEN production rules.
Knowledge in neural networks is stored as synaptic Knowledge in neural networks is stored as synaptic
weights between neurons.
weights between neurons.
In expert systems, knowledge can be divided into In expert systems, knowledge can be divided into individual rules and the user can see and
individual rules and the user can see and
understand the piece of knowledge applied by the understand the piece of knowledge applied by the
system.
system.
In neural networks, one cannot select a single In neural networks, one cannot select a single
synaptic weight as a discrete piece of knowledge.
synaptic weight as a discrete piece of knowledge.
Here knowledge is embedded in the entire Here knowledge is embedded in the entire
network; it cannot be broken into individual network; it cannot be broken into individual
pieces, and any change of a synaptic weight may pieces, and any change of a synaptic weight may
lead to unpredictable results. A neural network is, lead to unpredictable results. A neural network is,
in fact, a
in fact, a black-boxblack-box for its user. for its user.
Can we combine advantages of expert systems Can we combine advantages of expert systems
and neural networks to create a more powerful and neural networks to create a more powerful
and effective expert system?
and effective expert system?
A hybrid system that combines a neural network and A hybrid system that combines a neural network and a rule-based expert system is called a
a rule-based expert system is called a neural expert neural expert system
system (or a (or a connectionist expert systemconnectionist expert system). ).
The heart of a neural expert system is the The heart of a neural expert system is the inference engine
inference engine . It controls the information . It controls the information
flow in the system and initiates inference over the flow in the system and initiates inference over the neural knowledge base. A neural inference engine neural knowledge base. A neural inference engine
also ensures
also ensures approximate reasoning approximate reasoning . .
Approximate reasoning Approximate reasoning
In a rule-based expert system, the inference engine In a rule-based expert system, the inference engine compares the condition part of each rule with data compares the condition part of each rule with data given in the database. When the IF part of the rule given in the database. When the IF part of the rule matches the data in the database, the rule is fired and matches the data in the database, the rule is fired and its THEN part is executed. The
its THEN part is executed. The precise matchingprecise matching is is required (inference engine cannot cope with noisy or required (inference engine cannot cope with noisy or incomplete data).
incomplete data).
Neural expert systems use a trained neural network in Neural expert systems use a trained neural network in place of the knowledge base. The input data does not place of the knowledge base. The input data does not have to precisely match the data that was used in
have to precisely match the data that was used in network training. This ability is called
network training. This ability is called approximate approximate reasoning
reasoning..
Rule extraction Rule extraction
Neurons in the network are connected by links, Neurons in the network are connected by links,
each of which has a numerical weight attached to it.
each of which has a numerical weight attached to it.
The weights in a trained neural network determine The weights in a trained neural network determine the strength or importance of the associated neuron the strength or importance of the associated neuron
inputs.
inputs.
Trained Neural Network To Identify Flying Objects
Is there any way that we could interpret the values in the weights in a meaningful way?
Algorithm Algorithm
By attaching a corresponding question to each input By attaching a corresponding question to each input neuron, we can enable the system to prompt the user neuron, we can enable the system to prompt the user for initial values of the input variables:
for initial values of the input variables:
Neuron:
Neuron: WingsWings
Question: Does the object have wings?
Question: Does the object have wings?
Neuron:
Neuron: TailTail
Question: Does the object have a tail?
Question: Does the object have a tail?
Neuron:
Neuron: BeakBeak
Question: Does the object have a beak?
Question: Does the object have a beak?
Neuron:
Neuron: FeathersFeathers
Question: Does the object have feathers?
Question: Does the object have feathers?
Neuron:
Neuron: EngineEngine
Question: Does the object have an engine?
Question: Does the object have an engine?
Score 1 for yes, -1 for no and 0 for unknown Score 1 for yes, -1 for no and 0 for unknown
Use a sign function as the activation and interpret 0 for no and 1 for yes.
Use a sign function as the activation and interpret 0 for no and 1 for yes.
Exercise: Neuro-rule inference Exercise: Neuro-rule inference
If we set each input of the input layer to either +1 (true),
If we set each input of the input layer to either +1 (true), 1 (false), or 0 (unknown), we can give a semantic 1 (false), or 0 (unknown), we can give a semantic interpretation for the activation of any output neuron.
interpretation for the activation of any output neuron.
For example, if the object has
For example, if the object has WingsWings (+1), (+1), BeakBeak (+1) and (+1) and FeathersFeathers (+1), but does not have (+1), but does not have Engine (Engine (1)1) What can we conclude about the object being a bird, a plane or a glider
What can we conclude about the object being a bird, a plane or a glider applying a threshold of 0 and using the sign function as an activation function?
We can conclude that this object may be a
We can conclude that this object may be a BirdBird
0 3
. 5 )
1 . 1 ( ) 1 ( 8 . 2 1 2 . 2 1 ) 2 . 0 ( 0 )
8 . 0 (
Bird 1
X
How can we extract rules from this How can we extract rules from this
Neural Network ?
Neural Network ?
Rule Extraction from a Neural Network
An inference can be made if the known net An inference can be made if the known net
weighted input to a neuron is greater than the weighted input to a neuron is greater than the
sum of the absolute values of the weights of sum of the absolute values of the weights of
the unknown inputs.
the unknown inputs.
€
x
iw
i>
i=1
∑
nw
j j=1∑
nwhere
where ii known, known, jj known and known and nn is the number is the number of neuron inputs.
of neuron inputs.
Algorithm for Extracting Confidence
Heuristic: Known greater than unknown
Class Exercise: Confidence in Neural Rules
In the neural rules below suppose that you find an increasing amount of information about an object:
1 It has feathers.
2 It has feathers and a beak
3 It has feathers, a beak and wings.
At what point, according to the above algorithm, can the inference be made that the object is a bird? How much difference does the
knowledge about wings make?
Enter initial value for the input Feathers:
Enter initial value for the input Feathers:
+1+1
KNOWN = 1
KNOWN = 12.8 = 2.82.8 = 2.8 UNKNOWN =
UNKNOWN = 0.80.8 + + 0.20.2 + + 2.22.2 + + 1.11.1 = 4.3 = 4.3 KNOWN
KNOWN UNKNOWN UNKNOWN
Enter initial value for the input Beak:
Enter initial value for the input Beak:
+1+1
KNOWN = 1
KNOWN = 12.8 + 12.8 + 12.2 = 5.02.2 = 5.0 UNKNOWN =
UNKNOWN = 0.80.8 + + 0.20.2 + + 1.11.1 = 2.1 = 2.1 KNOWN
KNOWN UNKNOWN UNKNOWN
CONCLUDE: Bird is TRUE CONCLUDE: Bird is TRUE
A Set of rules can be mapped into a multi-layer neural network architecture
1. The weights between the layers represent rule certainties
1. After establishing the initial structure of the ANN a training algorithm may be applied.
1. After training the weights may be used to refine the initial set of rules.
Basic structure of a neural expert system Basic structure of a neural expert system
Inference Engine
Neural Knowledge Base Rule Extraction
Explanation Facilities
User Interface
User
Rule: IF - THEN Training Data
New Data
Evolutionary neural networks
Evolutionary neural networks
Evolutionary neural networks Evolutionary neural networks
Although neural networks are used for solving a Although neural networks are used for solving a variety of problems, they still have some
variety of problems, they still have some limitations.
limitations.
One of the most common is associated with neural One of the most common is associated with neural network training. The back-propagation learning network training. The back-propagation learning
algorithm cannot guarantee an optimal solution.
algorithm cannot guarantee an optimal solution.
In real-world applications, the back-propagation In real-world applications, the back-propagation algorithm might converge to a set of sub-optimal algorithm might converge to a set of sub-optimal
weights from which it cannot escape. As a result, weights from which it cannot escape. As a result,
the neural network is often unable to find a the neural network is often unable to find a
desirable solution to a problem at hand.
desirable solution to a problem at hand.
Another difficulty is related to selecting an Another difficulty is related to selecting an
optimal topology for the neural network. The optimal topology for the neural network. The
“right” network architecture for a particular
“right” network architecture for a particular
problem is often chosen by means of heuristics, problem is often chosen by means of heuristics, and designing a neural network topology is still and designing a neural network topology is still
more art than engineering.
more art than engineering.
Genetic algorithms are an effective optimisation Genetic algorithms are an effective optimisation technique that can guide both weight optimisation technique that can guide both weight optimisation
and topology selection.
and topology selection.
Encoding a set of weights in a chromosome
Encoding a set of weights in a chromosome
The second step is to define a fitness function for The second step is to define a fitness function for evaluating the chromosome’s performance. This evaluating the chromosome’s performance. This
function must estimate the performance of a function must estimate the performance of a given neural network. We can apply here a given neural network. We can apply here a
simple function defined by the sum of squared simple function defined by the sum of squared
errors.
errors.
The training set of examples is presented to the The training set of examples is presented to the network, and the sum of squared errors is
network, and the sum of squared errors is
calculated. The smaller the sum, the fitter the calculated. The smaller the sum, the fitter the
chromosome.
chromosome. The genetic algorithm attempts The genetic algorithm attempts to find a set of weights that minimises the sum to find a set of weights that minimises the sum
of squared errors.
of squared errors.
The third step is to choose the genetic operators – The third step is to choose the genetic operators – crossover and mutation. A crossover operator
crossover and mutation. A crossover operator takes two parent chromosomes and creates a takes two parent chromosomes and creates a single child with genetic material from both single child with genetic material from both
parents. Each gene in the child’s chromosome is parents. Each gene in the child’s chromosome is
represented by the corresponding gene of the represented by the corresponding gene of the
randomly selected parent.
randomly selected parent.
A mutation operator selects a gene in a A mutation operator selects a gene in a
chromosome and adds a small random value chromosome and adds a small random value
between
between 1 and 1 to each weight in this gene.1 and 1 to each weight in this gene.
Crossover in weight optimisation
Crossover in weight optimisation
Mutation in weight optimisation
Mutation in weight optimisation
Can genetic algorithms help us in selecting Can genetic algorithms help us in selecting
the network architecture?
the network architecture?
The architecture of the network (i.e. the number of The architecture of the network (i.e. the number of neurons and their interconnections) often determines neurons and their interconnections) often determines the success or failure of the application. Usually the the success or failure of the application. Usually the
network architecture is decided by trial and error;
network architecture is decided by trial and error;
there is a great need for a method of automatically there is a great need for a method of automatically
designing the architecture for a particular application.
designing the architecture for a particular application.
Genetic algorithms may well be suited for this task.
Genetic algorithms may well be suited for this task.
The basic idea behind evolving a suitable network The basic idea behind evolving a suitable network architecture is to conduct a genetic search in a
architecture is to conduct a genetic search in a population of possible architectures.
population of possible architectures.
We must first choose a method of encoding a We must first choose a method of encoding a network’s architecture into a chromosome.
network’s architecture into a chromosome.
Encoding the network architecture Encoding the network architecture
The connection topology of a neural network can The connection topology of a neural network can be represented by a square connectivity matrix.
be represented by a square connectivity matrix.
Each entry in the matrix defines the type of Each entry in the matrix defines the type of
connection from one neuron (column) to another connection from one neuron (column) to another
(row), where 0 means no connection and 1 (row), where 0 means no connection and 1
denotes connection for which the weight can be denotes connection for which the weight can be
changed through learning.
changed through learning.
To transform the connectivity matrix into a To transform the connectivity matrix into a
chromosome, we need only to string the rows of chromosome, we need only to string the rows of
the matrix together.
the matrix together.
Encoding of the network topology
Encoding of the network topology
The cycle of evolving a neural network topology
The cycle of evolving a neural network topology