Introduction
- Neural networks
- Low-level computational structure
- Performs well with raw data
- Can learn, but opaque to user
- Fuzzy logic
- High-level reasoning
- Using linguistic information from domain experts
- Lack the ability to learn & adjust to new environments
- Fuzzy-neural network
- Parallel computation & learning abilities + human-like knowledge representation & explanation abilities
- Trained to develop IF-THEN fuzzy rules & determine membership functions
Fuzzy-Neural Network
- Multi-layer neural network with 3 hidden layers
- Input
- Input membership functions
- Activation function: triangular, Gaussian, etc.
- Fuzzy rules
- Conjunction of the rule antecedents evaluated by the fuzzy operation intersection: product, etc.
- $$\mu_{Rn}$$: firing strength of fuzzy rule neuron $$Rn$$
- $$w_{Rn}$$: normalized degrees of confidence (certainty factors)
- Adjusted during training
- Normalized by dividing the values by the highest weight obtained at each iteration
- Output membership functions
- Combines fuzzy rule neurons using fuzzy operator union: e.g. probabilistic sum
- Defuzzification
- Each neuron = single output of the system
- Summary
- To remove incorrect rule
- To find all relevant rules
- To fine relative importance of the rules
- To tune the shape of the membership functions
Example: Mamdani Fuzzy Inference Model
Activation Function Example: Triangular Function
$$ \begin{aligned} y = &0 &\text{ if } x \le a - \frac{b}{2} \&1 - \frac{2|x-a|}{b} &\text{ if } a - \frac{b}{2} < x < a + \frac{b}{2} \&0 &\text{ if } x \ge a + \frac{b}{2} \end{aligned}
$$
- $$a$$: center
- $$b$$: width of triangle
Defuzzification Example: Sum-Product Composition
Weighted average of the centroids of all output membership functions.
$$y = \frac{\sum \mu{Ci}a{Ci}b{Ci}}{\sum \mu{Ci} b_{Ci}}$$
Example: XOR
Training
- Construct FNN using the fuzzy rules defined, use BP to train
- Bad rules will be pruned
- Construct FNN with random rules, sue BP to train
- Eventually the rules can be extracted from numerical data
Example: Adaptive Neuro-Fuzzy Inference System (ANFIS)
Based on Sugeno Fuzzy Inference Model.
Sugeno Fuzzy Rule
IF $$x_1$$ is $$A_1$$ AND ... AND $$x_m$$ is $$A_m$$
THEN $$y = f(x_1, ..., x_m)$$
- $$y$$
- Constant: zero-order Sugeno fuzzy model
- First-order polynomial: first-order Sugeno fuzzy model
- Layer 1: input layer
- Layer 2: fuzzification layer
- Bell activation function
- $$y = \frac{1}{1 + (\frac{x - a}{c})^{2b}}$$
- $$a$$: center
- $$b$$: width
- $$c$$: slope
- Bell activation function
- Layer 3: rule layer
- Firing strength of the truth value
- Conjunction of rule antecedents: operator product
- $$yi = \prod^n{j=1} x_{ji}$$
- Layer 4: normalization layer
- Normalized firing strength of a given rule
- Represents the contribution of a rule to the final result
- $$yi = \frac{x{ii}}{\sum^n{j=1}x{ji}} = \frac{\mui}{\sum^n{j=1}\mu_j} = \overline{\mu_i}$$
- Normalized firing strength of a given rule
- Layer 5: defuzzification layer
- Calculates weighted consequent value of a given rule
- $$yi = \overline{\mu_i}[k{i0} + k{i1}x_1 + ... + k{im}x_m]$$
- $$k_{ij}$$: consequent parameters of rule $$i$$
- Calculates weighted consequent value of a given rule
- Layer 6: summation neuron
- $$y = \sum^n_{i=1} x_i$$
- Functionally equivalent to a first-order Sugeno fuzzy model
Training
Hybrid learning algorithm combining least-squares estimator & gradient descent method.
- Assign initial activation functions to each membership neuron
- Function centers connected to input are set s.t. the domains are divided equally
- Widths & slopes set to allow sufficient overlapping
- Epoch
- Forward pass: learn consequent parameters
- Input: $$x_i(p)$$, output: $$y_d(p)$$ for $$p = 1, ..., P$$ (number of input-output patterns)
- Forward propagate $$y_d = A k$$
- $$y_d$$: $$P \times 1$$
- $$y_d = [y_d(1) \dots y_d(P)]^T$$
- $$A$$: $$P \times n(1+m)$$
- $$k$$: $$n(1+m) \times 1$$ (number of consequent parameters)
- $$k = [k{10}k{11}\dots k{1m}k{20}\dots k_{nm}]^T$$
- $$y_d$$: $$P \times 1$$
- $$P$$ usually greater than $$n(1+m)$$
- $$k^$$: *Least-square estimate of $$k$$
- $$k^* = (A^TA)^{-1}A^Ty_d$$
- Minimizes square error $$| Ak - y_d |^2$$
- $$k^$$: *Least-square estimate of $$k$$
- Error vector $$e$$
- $$e = y_d - y = y_d - Ak^*$$
- Backward pass: learn antecedent/premise parameters
- $$\Delta a = -\alpha \frac{\partial E}{\partial a}$$
- $$E = \frac{1}{2}e^2$$
- $$\frac{\partial E}{\partial a} = \frac{\partial E}{\partial e}\frac{\partial e}{\partial y}\frac{\partial y}{\partial \overline{\mui} f_i}\frac{\partial \overline{\mu_i} f_i}{\partial \overline{\mu_i}}\frac{\partial \overline{\mu_i}}{\partial \mu_i}\frac{\partial \mu_i}{\partial \mu{Aj}}\frac{\partial \mu_{Aj}}{\partial a}$$
- Forward pass: learn consequent parameters
$$\begin{aligned} \overline{\mui} &= \frac{\mu_i}{\sum^n{j=1} \muj}\ \frac{\partial \overline{\mu_i}}{\partial \mu_i} &= \frac{(\sum^n{j=1} \muj) - \mu_i}{(\sum^n{j=1} \muj)^2}\ &= \frac{1}{\mu_i}(\frac{\mu_i}{(\sum^n{j=1} \muj)} - \frac{\mu_i^2}{(\sum^n{j=1} \mu_j)^2})\ &= \frac{1}{\mu_i}(\overline{\mu_i} - \overline{\mu_i}^2) = \frac{\overline{\mu_i}(1 - \overline{\mu_i})}{\mu_i} \end{aligned}$$
Summary
Type 1: use ANN to determine the membership functions
- Determine membership functions through ANN
- Combined with predetermined set of fuzzy rules
Type 2: use ANN to determine the fuzzy rules
- Extract fuzzy rules from training data (clustering approach)
- Combined with predetermined fuzzy sets
Type 3: online adaptation of membership functions
- Initialize fuzzy rules & membership functions
- Update/learn the membership functions
Type 4: weighted rules for the fuzzy system
- Update/learn the importance of each rule (weighted rules) in the fuzzy system