xxxxxxxxxx

5.8 s

Demo for the PlutoCon 201

Author: Daniel Molina Cabrera

Talk: Teaching Computer Sciences and Metaheuristics Algorithms using Pluto

xxxxxxxxxx

9.3 μs

Grouping with constraints Problem

In this problem there is a dataset with N instances that represent numerical vectos, and it is wanted to group them in k clusters.

The target is select the clustering considering:

Fulfil as much as possible a number of constraints (several pairs of instances must be in the same clusters, they are listed in list ML, and another pairs of instances that have to be in different clusters, in list CL).

$i n f e a s a b i l i t y = \begin{array}{l} \sum_{i = 0}^{| M L |} b o o l 2 i n t (h c (\vec{M L_{[i, 1]}}) \neq h c (\vec{M L_{[i, 2]}})) \\ + b o o l 2 i n t (h c (\vec{C L_{[i, 1]}}) = h c (\vec{C L_{[i, 2]}})) \end{array}$

Reduce the intra-cluster distance:

For each cluster i it is calculated its centroid (median of the instances member of that cluster): $\vec{μ_{i}} = \frac{1}{c} \sum_{\vec{x_{j}} \in c_{i}} \vec{x_{i}}$
The instra-cluster distance is calculate with $\bar{c_{i}} = \frac{1}{| c_{i} |} \sum_{\vec{x_{j}} \in c_{i}} | | \vec{x_{j}} - \vec{μ_{i}} | |_{2}$ .

xxxxxxxxxx

25.8 μs

Initial problem:

Giving several points like following:

xxxxxxxxxx

6.4 μs

scene (generic function with 1 method)

xxxxxxxxxx

70.3 μs

xxxxxxxxxx

47.0 ms

xxxxxxxxxx

5.0 s

Problem: number k and constraints

xxxxxxxxxx

3.1 μs

xxxxxxxxxx

66.8 μs

|ML|: |CL|:

xxxxxxxxxx

92.9 μs

There are several constraints between groups

xxxxxxxxxx

3.0 μs

xxxxxxxxxx

29.7 μs

ML: {(4, 2), (9, 4), (8, 3)}

CL: {(6, 2)}

xxxxxxxxxx

323 ms

plot_pac (generic function with 2 methods)

xxxxxxxxxx

139 μs

We represent visually the constraints that have to be in the same cluster with a continuous line, and with a dashed line which have to be in different clusters.

xxxxxxxxxx

2.8 μs

xxxxxxxxxx

561 ms

Initial solution:

A initial solution is a partition, represented by a vector of length N, in which each position i has a value between 1 and k, the value represent the cluster assigned to the instance i.

xxxxxxxxxx

7.0 μs

colores

Symbol1

:green

:orange

:black

xxxxxxxxxx

1.1 μs

xxxxxxxxxx

42.6 μs

Int641

xxxxxxxxxx

8.9 μs

We remark with different colors each clustering, indicating in the title how many constraints are violated, and the average inter-clustering distance.

xxxxxxxxxx

3.0 μs

 
plot_pac(grouping)

774 ms

incumple (generic function with 1 method)

xxxxxxxxxx

39.5 μs

dist_intra (generic function with 1 method)

xxxxxxxxxx

48.4 μs

distancia_intracluster (generic function with 1 method)

xxxxxxxxxx

41.2 μs

plot_clusters (generic function with 1 method)

xxxxxxxxxx

72.3 μs

Greedy Algorithm

The Greedy algorithm build a solution step by step. The process is as follow:

Randomly generate K clusters.

xxxxxxxxxx

7.1 μs

Seed:

xxxxxxxxxx

58.5 μs

xxxxxxxxxx

26.3 μs

3×2 Matrix{Float64}:
 0.0123788  0.971216
 0.455168   0.844532
 0.90785    0.368487

xxxxxxxxxx

2.5 μs

xxxxxxxxxx

141 ms

For each element it is selected the cluster closer than violate less constraints.

xxxxxxxxxx

7.1 μs

xxxxxxxxxx

23.0 ms

clusters: {0.012378843419466268, 0.45516843007784, 0.9078498956593186, 0.971215649338451, 0.8445317509734542, 0.36848703579346}

xxxxxxxxxx

290 ms

xxxxxxxxxx

3.3 s

If the centroids should be updated, recalculate it and go to step 2.

xxxxxxxxxx

5.0 μs

xxxxxxxxxx

34.1 μs

xxxxxxxxxx

989 ms

The final solution obtained is:

xxxxxxxxxx

3.2 μs

xxxxxxxxxx

19.9 ms

creaV (generic function with 1 method)

xxxxxxxxxx

45.1 μs

count_violations (generic function with 1 method)

xxxxxxxxxx

39.2 μs

plot_greedy_options (generic function with 1 method)

xxxxxxxxxx

82.2 μs

step_greedy (generic function with 1 method)

xxxxxxxxxx

124 μs

Meta-heuristic

To be able to apply the simple meta-heuristic we need a only objective function, thus we are going to join both criteria in a only fitness function.

xxxxxxxxxx

4.6 μs

The measure is defined as $\vec{C} + (i n f e a s a b i l i t y \cdot λ)$ , as a combination of the intra-cluster distance with a $λ$ penalty for each constraints that is not fullfil.

xxxxxxxxxx

3.6 μs

The first action is to calculate $λ$ , defined ass the maximum distance between two points and divided by the total of constraints: $λ = \frac{M a x (D i s t_{i j})}{R}, \forall i, j \in {1, . . ., n}$

xxxxxxxxxx

3.5 μs

$λ$ value: 0.23 = 0.9205 / 4

xxxxxxxxxx

12.4 ms

fitness (generic function with 1 method)

xxxxxxxxxx

22.5 μs

xxxxxxxxxx

31.8 μs

xxxxxxxxxx

79.2 ms

The local search procedure is to randomly choose one position, and change its value (maintaning that for each cluster has at least one member).

xxxxxxxxxx

3.1 μs

xxxxxxxxxx

38.5 μs

BL (generic function with 1 method)

xxxxxxxxxx

90.4 μs

xxxxxxxxxx

89.0 ms

In the following there is the convergence graphic, that show the improvement of fitness through the run of the algorithm.

xxxxxxxxxx

3.1 μs

xxxxxxxxxx

300 ns