Skip to content

GMMInitializer API

GMMInitializer

A utility class providing various initialization strategies for GMM parameters.

This class defines several static methods to produce initial means, weights, and covariances for Gaussian Mixture Models from a dataset data (a 2D tensor of shape (N, D)):

  • :func:random
  • :func:points
  • :func:kpp
  • :func:kmeans
  • :func:maxdist

Mathematical Descriptions:

  • random:
    Computes the empirical mean \(\bar{x}\) and covariance \(\Sigma\) of data and draws initial centers as: $$ \mu_i = \bar{x} + L z, \quad z \sim \mathcal{N}(0, I_d), $$ where $ L $ is the Cholesky factor of \(\Sigma\).

  • points:
    Randomly selects $ k $ data points: $$ \mu_i = x_{s_i}, \quad \text{for } s_i \in \text{random subset of } {1, \dots, N}. $$

  • kpp (k-means++):
    Chooses the first center uniformly at random and subsequent centers with probability proportional to the squared distance to the nearest already chosen center: $$ P(x_j) = \frac{D(x_j)}{\sum_{j=1}^{N} D(x_j)}, \quad \text{where } D(x_j) = \min_{l} |x_j - \mu_l|^2. $$

  • kmeans:
    Runs the k-means algorithm starting from k-means++ initialization. At each iteration: $$ c_j = \arg\min_i |x_j - \mu_i|^2, \quad \mu_i = \frac{1}{|C_i|} \sum_{x_j \in C_i} x_j, $$ until convergence.

  • maxdist:
    A modified k-means++ that selects subsequent centers as: $$ \mu_i = \arg\max_{x} \min_{l < i} |x - \mu_l|, $$ and then reselects the first center as: $$ \mu_1 = \arg\max_{x} \min_{l=2}^{k} |x - \mu_l|. $$

Example usage::

from src.gmm_init import GMMInitializer

data = torch.randn(1000, 2)  # Synthetic data
k = 4
init_means = GMMInitializer.random(data, k)
Source code in tgmm/gmm_init.py
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
class GMMInitializer:
    r"""
    A utility class providing various initialization strategies for GMM parameters.

    This class defines several static methods to produce initial means, weights, and
    covariances for Gaussian Mixture Models from a dataset ``data`` (a 2D tensor of shape (N, D)):

    - :func:`random`
    - :func:`points`
    - :func:`kpp`
    - :func:`kmeans`
    - :func:`maxdist`

    **Mathematical Descriptions:**

    - **random:**  
      Computes the empirical mean $\bar{x}$ and covariance $\Sigma$ of ``data`` and draws
      initial centers as:
      $$
      \mu_i = \bar{x} + L z, \quad z \sim \mathcal{N}(0, I_d),
      $$
      where $ L $ is the Cholesky factor of $\Sigma$.

    - **points:**  
      Randomly selects $ k $ data points:
      $$
      \mu_i = x_{s_i}, \quad \text{for } s_i \in \text{random subset of } \{1, \dots, N\}.
      $$

    - **kpp (k-means++):**  
      Chooses the first center uniformly at random and subsequent centers with probability
      proportional to the squared distance to the nearest already chosen center:
      $$
      P(x_j) = \frac{D(x_j)}{\sum_{j=1}^{N} D(x_j)}, \quad \text{where } D(x_j) = \min_{l} \|x_j - \mu_l\|^2.
      $$

    - **kmeans:**  
      Runs the k-means algorithm starting from k-means++ initialization. At each iteration:
      $$
      c_j = \arg\min_i \|x_j - \mu_i\|^2, \quad \mu_i = \frac{1}{\|C_i\|} \sum_{x_j \in C_i} x_j,
      $$
      until convergence.

    - **maxdist:**  
      A modified k-means++ that selects subsequent centers as:
      $$
      \mu_i = \arg\max_{x} \min_{l < i} \|x - \mu_l\|,
      $$
      and then reselects the first center as:
      $$
      \mu_1 = \arg\max_{x} \min_{l=2}^{k} \|x - \mu_l\|.
      $$

    Example usage::

        from src.gmm_init import GMMInitializer

        data = torch.randn(1000, 2)  # Synthetic data
        k = 4
        init_means = GMMInitializer.random(data, k)
    """

    @staticmethod
    def random(data: torch.Tensor, k: int) -> torch.Tensor:
        r"""
        Randomly initialize cluster centers by sampling from the empirical
        distribution of ``data``.

        Mathematically, if $\bar{x}$ and $\Sigma$ are the sample mean and covariance
        of the data, then:
        $$
        \mu_i = \bar{x} + L z, \quad z \sim \mathcal{N}(0, I_d),
        $$
        where $L$ is the Cholesky factor of $\Sigma$.

        Parameters
        ----------
        data : torch.Tensor
            A 2D tensor of shape (N, D) representing the dataset.
        k : int
            Number of cluster centers to generate.

        Returns
        -------
        torch.Tensor
            A (k, D) tensor representing the initial cluster centers.
        """
        mu = torch.mean(data, dim=0)
        if data.dim() == 1:
            cov = torch.var(data)
            samples = torch.randn(k, device=data.device) * torch.sqrt(cov)
        else:
            cov = torch.cov(data.t())
            samples = torch.randn(k, data.size(1), device=data.device) @ torch.linalg.cholesky(cov).t()
        samples += mu
        return samples

    @staticmethod
    def points(data: torch.Tensor, k: int) -> torch.Tensor:
        r"""
        Initialize cluster centers by randomly selecting existing data points.

        Parameters
        ----------
        data : torch.Tensor
            A 2D tensor of shape (N, D) representing the dataset.
        k : int
            Number of cluster centers to generate.

        Returns
        -------
        torch.Tensor
            A (k, D) tensor representing the initial cluster centers.
        """
        indices = torch.randperm(data.size(0), device=data.device)[:k]
        return data[indices]

    @staticmethod
    def kpp(data: torch.Tensor, k: int) -> torch.Tensor:
        r"""
        Initialize cluster centers using the k-means++ algorithm.

        The first center is chosen uniformly at random. Subsequent centers are chosen
        with probability proportional to the squared distance from the nearest existing
        center:
        $$
        P(x_j) = \frac{D(x_j)}{\sum_{j=1}^{N} D(x_j)}, \quad D(x_j) = \min_{l} \|x_j - \mu_l\|^2.
        $$

        Parameters
        ----------
        data : torch.Tensor
            A 2D tensor of shape (N, D) representing the dataset.
        k : int
            Number of cluster centers to generate.

        Returns
        -------
        torch.Tensor
            A (k, D) tensor representing the initial cluster centers.
        """
        n_samples, _ = data.shape
        centroids = torch.empty((k, data.size(1)), device=data.device)

        # Pick the first center uniformly at random
        initial_idx = torch.randint(0, n_samples, (1,), device=data.device)
        centroids[0] = data[initial_idx]

        for i in range(1, k):
            dist_sq = torch.cdist(data, centroids[:i]).pow(2).min(dim=1)[0]
            probabilities = dist_sq / dist_sq.sum()
            selected_idx = torch.multinomial(probabilities, 1)
            centroids[i] = data[selected_idx]

        return centroids

    @staticmethod
    def kmeans(data: torch.Tensor, k: int, max_iter: int = 1000, atol: float = 1e-4) -> torch.Tensor:
        r"""
        Initialize cluster centers by running the k-means algorithm on ``data``.

        Starting from a k-means++ initialization, k-means iteratively refines the centers by:

        1. **Assignment:**  
           $c_j = \arg\min_{i} \|x_j - \mu_i\|^2,$ for each data point $x_j$.

        2. **Update:**  
           $\mu_i = \frac{1}{\|C_i\|} \sum_{x_j \in C_i} x_j,$ where $C_i$ is the set of points assigned to center $\mu_i$.

        The algorithm stops when the centers move by less than the specified tolerance.

        Parameters
        ----------
        data : torch.Tensor
            A 2D tensor of shape (N, D) representing the dataset.
        k : int
            Number of cluster centers to generate.
        max_iter : int, optional
            Maximum number of iterations (default is 1000).
        atol : float, optional
            Convergence tolerance (default is $1\times10^{-4}$).

        Returns
        -------
        torch.Tensor
            A (k, D) tensor representing the final cluster centers.
        """
        centroids = GMMInitializer.kpp(data, k)

        for _ in range(max_iter):
            distances = torch.cdist(data, centroids)
            labels = torch.argmin(distances, dim=1)
            new_centroids = torch.stack([data[labels == i].mean(dim=0) for i in range(k)])
            if torch.allclose(centroids, new_centroids, atol=atol):
                break
            centroids = new_centroids

        return centroids

    @staticmethod
    def maxdist(data: torch.Tensor, k: int) -> torch.Tensor:
        r"""
        A modified k-means++ initialization that maximizes the minimum distance
        between centers.

        After randomly selecting the first center, each subsequent center is chosen as:
        $$
        \mu_i = \arg\max_{x \in \mathcal{D}} \min_{l=1,\dots,i-1} \|x - \mu_l\|,
        $$
        ensuring that the new center is as far as possible from the existing centers.
        Finally, the first center is reselected as:
        $$
        \mu_1 = \arg\max_{x \in \mathcal{D}} \min_{l=2}^{k} \|x - \mu_l\|.
        $$

        Parameters
        ----------
        data : torch.Tensor
            A 2D tensor of shape (N, D).
        k : int
            Number of cluster centers.

        Returns
        -------
        torch.Tensor
            A (k, D) tensor representing the initial cluster centers.
        """
        n_samples, _ = data.shape
        centroids = torch.empty((k, data.size(1)), device=data.device)

        initial_idx = torch.randint(0, n_samples, (1,), device=data.device)
        centroids[0] = data[initial_idx]

        for i in range(1, k):
            dist_sq = torch.cdist(data, centroids[:i]).pow(2)
            min_dist = dist_sq.min(dim=1)[0]
            selected_idx = torch.argmax(min_dist)
            centroids[i] = data[selected_idx]

        dist_sq_to_first = torch.cdist(data, centroids[1:]).pow(2)
        min_dist_to_first = dist_sq_to_first.min(dim=1)[0]
        new_first_idx = torch.argmax(min_dist_to_first)
        centroids[0] = data[new_first_idx]

        return centroids

    # ====================================================================
    # Weight Initialization Methods
    # ====================================================================

    @staticmethod
    def init_weights_uniform(n_components: int, device: torch.device) -> torch.Tensor:
        r"""
        Initialize weights uniformly (equal weights for all components).

        Parameters
        ----------
        n_components : int
            Number of mixture components.
        device : torch.device
            Device to create the tensor on.

        Returns
        -------
        torch.Tensor
            Uniform weights of shape (n_components,).
        """
        return torch.full(
            (n_components,),
            1.0 / n_components,
            dtype=torch.float32,
            device=device
        )

    @staticmethod
    def init_weights_random(n_components: int, device: torch.device) -> torch.Tensor:
        r"""
        Initialize weights randomly from a Dirichlet distribution.

        Parameters
        ----------
        n_components : int
            Number of mixture components.
        device : torch.device
            Device to create the tensor on.

        Returns
        -------
        torch.Tensor
            Random weights of shape (n_components,) that sum to 1.
        """
        alpha = torch.ones(n_components, device=device)
        weights = torch.distributions.Dirichlet(alpha).sample()
        return weights.float()

    @staticmethod
    def init_weights_from_clusters(data: torch.Tensor, means: torch.Tensor) -> torch.Tensor:
        r"""
        Initialize weights proportionally to cluster sizes based on k-means assignment.

        Parameters
        ----------
        data : torch.Tensor
            Input data of shape (n_samples, n_features).
        means : torch.Tensor
            Cluster means of shape (n_components, n_features).

        Returns
        -------
        torch.Tensor
            Weights of shape (n_components,) proportional to cluster sizes.
        """
        if data.dim() == 1:
            data = data.unsqueeze(1)

        n_components = means.size(0)

        # Assign points to nearest mean
        distances = torch.cdist(data, means)
        labels = torch.argmin(distances, dim=1)

        # Count points per cluster
        counts = torch.bincount(labels, minlength=n_components).float()

        # Handle empty clusters
        counts = torch.clamp(counts, min=1e-10)

        # Normalize to get weights
        return counts / counts.sum()

    # ====================================================================
    # Covariance Initialization Methods
    # ====================================================================

    @staticmethod
    def init_covariances_eye(n_components: int, n_features: int, 
                            covariance_type: str, reg_covar: float, 
                            device: torch.device) -> torch.Tensor:
        r"""
        Initialize covariances as identity-like matrices/vectors.

        Parameters
        ----------
        n_components : int
            Number of mixture components.
        n_features : int
            Number of features.
        covariance_type : str
            Type of covariance ('full', 'diag', 'spherical', 'tied_full', 'tied_diag', 'tied_spherical').
        reg_covar : float
            Regularization added to diagonal.
        device : torch.device
            Device to create tensors on.

        Returns
        -------
        torch.Tensor
            Initialized covariances with appropriate shape.
        """
        if covariance_type == 'full':
            out = []
            for _ in range(n_components):
                mat = torch.eye(n_features, device=device) * (1.0 + reg_covar)
                out.append(mat)
            return torch.stack(out, dim=0)
        elif covariance_type == 'diag':
            return torch.ones(n_components, n_features, device=device) * (1.0 + reg_covar)
        elif covariance_type == 'spherical':
            return torch.ones(n_components, device=device) * (1.0 + reg_covar)
        elif covariance_type == 'tied_full':
            return torch.eye(n_features, device=device) * (1.0 + reg_covar)
        elif covariance_type == 'tied_diag':
            return torch.ones(n_features, device=device) * (1.0 + reg_covar)
        elif covariance_type == 'tied_spherical':
            return torch.tensor(1.0 + reg_covar, device=device)
        else:
            raise ValueError(f"Unsupported covariance type: {covariance_type}")

    @staticmethod
    def init_covariances_random(n_components: int, n_features: int,
                               covariance_type: str, reg_covar: float,
                               device: torch.device) -> torch.Tensor:
        r"""
        Initialize covariances randomly as positive semi-definite matrices.

        Parameters
        ----------
        n_components : int
            Number of mixture components.
        n_features : int
            Number of features.
        covariance_type : str
            Type of covariance.
        reg_covar : float
            Regularization added to diagonal.
        device : torch.device
            Device to create tensors on.

        Returns
        -------
        torch.Tensor
            Random covariances with appropriate shape.
        """
        if covariance_type in ('full', 'tied_full'):
            def random_spd(dim):
                A = torch.randn(dim, dim, device=device)
                return A @ A.mT + reg_covar * torch.eye(dim, device=device)

            if covariance_type == 'full':
                return torch.stack([random_spd(n_features) for _ in range(n_components)], dim=0)
            else:
                return random_spd(n_features)
        elif covariance_type in ('diag', 'tied_diag'):
            shape = (n_components, n_features) if covariance_type == 'diag' else (n_features,)
            return torch.rand(shape, device=device) + reg_covar
        elif covariance_type in ('spherical', 'tied_spherical'):
            shape = (n_components,) if (covariance_type == 'spherical') else ()
            return torch.rand(shape, device=device) + (1.0 + reg_covar)
        else:
            raise ValueError(f"Unsupported covariance type: {covariance_type}")

    @staticmethod
    def init_covariances_global(data: torch.Tensor, n_components: int,
                               covariance_type: str, reg_covar: float) -> torch.Tensor:
        r"""
        Initialize covariances using global data covariance.

        Parameters
        ----------
        data : torch.Tensor
            Input data of shape (n_samples, n_features).
        n_components : int
            Number of mixture components.
        covariance_type : str
            Type of covariance.
        reg_covar : float
            Regularization added to diagonal.

        Returns
        -------
        torch.Tensor
            Initialized covariances based on global data covariance.
        """
        if data.dim() == 1:
            data = data.unsqueeze(1)

        n_features = data.size(1)
        device = data.device

        # Compute global covariance matrix
        global_cov = torch.cov(data.T)
        global_cov += reg_covar * torch.eye(n_features, device=device)

        if covariance_type == 'full':
            return global_cov.unsqueeze(0).repeat(n_components, 1, 1)
        elif covariance_type == 'diag':
            diag_vals = torch.diagonal(global_cov)
            return diag_vals.unsqueeze(0).repeat(n_components, 1)
        elif covariance_type == 'spherical':
            avg_var = torch.mean(torch.diagonal(global_cov))
            return torch.full((n_components,), avg_var.item(), device=device)
        elif covariance_type == 'tied_full':
            return global_cov
        elif covariance_type == 'tied_diag':
            return torch.diagonal(global_cov)
        elif covariance_type == 'tied_spherical':
            return torch.mean(torch.diagonal(global_cov))
        else:
            raise ValueError(f"Unsupported covariance type: {covariance_type}")

    @staticmethod
    def init_covariances_empirical(data: torch.Tensor, means: torch.Tensor,
                                  covariance_type: str, reg_covar: float) -> torch.Tensor:
        r"""
        Initialize covariances empirically from cluster assignments.

        Assigns each data point to its nearest mean and computes cluster-wise
        empirical covariance matrices.

        Parameters
        ----------
        data : torch.Tensor
            Input data of shape (n_samples, n_features).
        means : torch.Tensor
            Cluster means of shape (n_components, n_features).
        covariance_type : str
            Type of covariance.
        reg_covar : float
            Regularization added to diagonal.

        Returns
        -------
        torch.Tensor
            Empirical covariances with appropriate shape.
        """
        if data.dim() == 1:
            data = data.unsqueeze(1)

        n_samples, n_features = data.shape
        n_components = means.size(0)
        device = data.device

        # Assign points to nearest mean
        distances = torch.cdist(data, means)
        labels = torch.argmin(distances, dim=1)

        if covariance_type == 'full':
            new_covs = []
            for k in range(n_components):
                cluster_mask = (labels == k)
                if not torch.any(cluster_mask):
                    cov_k = torch.eye(n_features, device=device) * (1.0 + reg_covar)
                else:
                    cluster_data = data[cluster_mask]
                    cov_k = torch.cov(cluster_data.T)
                    # Ensure cov_k is always a matrix (for n_features=1, torch.cov returns scalar)
                    if cov_k.ndim == 0:
                        cov_k = cov_k.reshape(1, 1)
                    cov_k += reg_covar * torch.eye(n_features, device=device)
                new_covs.append(cov_k)
            return torch.stack(new_covs, dim=0)

        elif covariance_type == 'diag':
            new_covs = []
            for k in range(n_components):
                cluster_mask = (labels == k)
                if not torch.any(cluster_mask):
                    cov_k = torch.ones(n_features, device=device) * (1.0 + reg_covar)
                else:
                    cluster_data = data[cluster_mask]
                    cov_mat = torch.cov(cluster_data.T)
                    # Ensure cov_mat is always a matrix (for n_features=1, torch.cov returns scalar)
                    if cov_mat.ndim == 0:
                        cov_k = cov_mat.reshape(1) + reg_covar
                    else:
                        cov_k = torch.diagonal(cov_mat) + reg_covar
                new_covs.append(cov_k)
            return torch.stack(new_covs, dim=0)

        elif covariance_type == 'spherical':
            new_covs = []
            for k in range(n_components):
                cluster_mask = (labels == k)
                if not torch.any(cluster_mask):
                    cov_k = 1.0 + reg_covar
                else:
                    cluster_data = data[cluster_mask]
                    cov_mat = torch.cov(cluster_data.T)
                    # For n_features=1, torch.cov returns a scalar
                    if cov_mat.ndim == 0:
                        cov_k = max(cov_mat.item(), reg_covar)
                    else:
                        cov_k = max(torch.mean(torch.diagonal(cov_mat)).item(), reg_covar)
                new_covs.append(torch.tensor(cov_k, device=device))
            return torch.stack(new_covs, dim=0)

        elif covariance_type == 'tied_full':
            sum_cov = torch.zeros(n_features, n_features, device=device)
            for k in range(n_components):
                cluster_mask = (labels == k)
                cluster_data = data[cluster_mask]
                if cluster_data.size(0) > 0:
                    diff = cluster_data - cluster_data.mean(dim=0, keepdim=True)
                    sum_cov += diff.T @ diff
            sum_cov /= n_samples
            sum_cov += reg_covar * torch.eye(n_features, device=device)
            return sum_cov

        elif covariance_type == 'tied_diag':
            sum_diag = torch.zeros(n_features, device=device)
            for k in range(n_components):
                cluster_mask = (labels == k)
                cluster_data = data[cluster_mask]
                if cluster_data.size(0) > 0:
                    diff = cluster_data - cluster_data.mean(dim=0, keepdim=True)
                    sum_diag += (diff * diff).sum(dim=0)
            sum_diag /= n_samples
            sum_diag += reg_covar
            return sum_diag

        elif covariance_type == 'tied_spherical':
            total_sum = 0.0
            for k in range(n_components):
                cluster_mask = (labels == k)
                cluster_data = data[cluster_mask]
                if cluster_data.size(0) > 0:
                    diff = cluster_data - cluster_data.mean(dim=0, keepdim=True)
                    total_sum += diff.pow(2).sum().item()
            var = max(total_sum / (n_samples * n_features), reg_covar)
            return torch.tensor(var, device=device)

        else:
            raise ValueError(f"Unsupported covariance type: {covariance_type}")

Functions

random(data, k) staticmethod

Randomly initialize cluster centers by sampling from the empirical distribution of data.

Mathematically, if \(\bar{x}\) and \(\Sigma\) are the sample mean and covariance of the data, then: $$ \mu_i = \bar{x} + L z, \quad z \sim \mathcal{N}(0, I_d), $$ where \(L\) is the Cholesky factor of \(\Sigma\).

Parameters:

Name Type Description Default
data Tensor

A 2D tensor of shape (N, D) representing the dataset.

required
k int

Number of cluster centers to generate.

required

Returns:

Type Description
Tensor

A (k, D) tensor representing the initial cluster centers.

Source code in tgmm/gmm_init.py
@staticmethod
def random(data: torch.Tensor, k: int) -> torch.Tensor:
    r"""
    Randomly initialize cluster centers by sampling from the empirical
    distribution of ``data``.

    Mathematically, if $\bar{x}$ and $\Sigma$ are the sample mean and covariance
    of the data, then:
    $$
    \mu_i = \bar{x} + L z, \quad z \sim \mathcal{N}(0, I_d),
    $$
    where $L$ is the Cholesky factor of $\Sigma$.

    Parameters
    ----------
    data : torch.Tensor
        A 2D tensor of shape (N, D) representing the dataset.
    k : int
        Number of cluster centers to generate.

    Returns
    -------
    torch.Tensor
        A (k, D) tensor representing the initial cluster centers.
    """
    mu = torch.mean(data, dim=0)
    if data.dim() == 1:
        cov = torch.var(data)
        samples = torch.randn(k, device=data.device) * torch.sqrt(cov)
    else:
        cov = torch.cov(data.t())
        samples = torch.randn(k, data.size(1), device=data.device) @ torch.linalg.cholesky(cov).t()
    samples += mu
    return samples

points(data, k) staticmethod

Initialize cluster centers by randomly selecting existing data points.

Parameters:

Name Type Description Default
data Tensor

A 2D tensor of shape (N, D) representing the dataset.

required
k int

Number of cluster centers to generate.

required

Returns:

Type Description
Tensor

A (k, D) tensor representing the initial cluster centers.

Source code in tgmm/gmm_init.py
@staticmethod
def points(data: torch.Tensor, k: int) -> torch.Tensor:
    r"""
    Initialize cluster centers by randomly selecting existing data points.

    Parameters
    ----------
    data : torch.Tensor
        A 2D tensor of shape (N, D) representing the dataset.
    k : int
        Number of cluster centers to generate.

    Returns
    -------
    torch.Tensor
        A (k, D) tensor representing the initial cluster centers.
    """
    indices = torch.randperm(data.size(0), device=data.device)[:k]
    return data[indices]

kpp(data, k) staticmethod

Initialize cluster centers using the k-means++ algorithm.

The first center is chosen uniformly at random. Subsequent centers are chosen with probability proportional to the squared distance from the nearest existing center: $$ P(x_j) = \frac{D(x_j)}{\sum_{j=1}^{N} D(x_j)}, \quad D(x_j) = \min_{l} |x_j - \mu_l|^2. $$

Parameters:

Name Type Description Default
data Tensor

A 2D tensor of shape (N, D) representing the dataset.

required
k int

Number of cluster centers to generate.

required

Returns:

Type Description
Tensor

A (k, D) tensor representing the initial cluster centers.

Source code in tgmm/gmm_init.py
@staticmethod
def kpp(data: torch.Tensor, k: int) -> torch.Tensor:
    r"""
    Initialize cluster centers using the k-means++ algorithm.

    The first center is chosen uniformly at random. Subsequent centers are chosen
    with probability proportional to the squared distance from the nearest existing
    center:
    $$
    P(x_j) = \frac{D(x_j)}{\sum_{j=1}^{N} D(x_j)}, \quad D(x_j) = \min_{l} \|x_j - \mu_l\|^2.
    $$

    Parameters
    ----------
    data : torch.Tensor
        A 2D tensor of shape (N, D) representing the dataset.
    k : int
        Number of cluster centers to generate.

    Returns
    -------
    torch.Tensor
        A (k, D) tensor representing the initial cluster centers.
    """
    n_samples, _ = data.shape
    centroids = torch.empty((k, data.size(1)), device=data.device)

    # Pick the first center uniformly at random
    initial_idx = torch.randint(0, n_samples, (1,), device=data.device)
    centroids[0] = data[initial_idx]

    for i in range(1, k):
        dist_sq = torch.cdist(data, centroids[:i]).pow(2).min(dim=1)[0]
        probabilities = dist_sq / dist_sq.sum()
        selected_idx = torch.multinomial(probabilities, 1)
        centroids[i] = data[selected_idx]

    return centroids

kmeans(data, k, max_iter=1000, atol=0.0001) staticmethod

Initialize cluster centers by running the k-means algorithm on data.

Starting from a k-means++ initialization, k-means iteratively refines the centers by:

  1. Assignment:
    \(c_j = \arg\min_{i} \|x_j - \mu_i\|^2,\) for each data point \(x_j\).

  2. Update:
    \(\mu_i = \frac{1}{\|C_i\|} \sum_{x_j \in C_i} x_j,\) where \(C_i\) is the set of points assigned to center \(\mu_i\).

The algorithm stops when the centers move by less than the specified tolerance.

Parameters:

Name Type Description Default
data Tensor

A 2D tensor of shape (N, D) representing the dataset.

required
k int

Number of cluster centers to generate.

required
max_iter int

Maximum number of iterations (default is 1000).

1000
atol float

Convergence tolerance (default is \(1\times10^{-4}\)).

0.0001

Returns:

Type Description
Tensor

A (k, D) tensor representing the final cluster centers.

Source code in tgmm/gmm_init.py
@staticmethod
def kmeans(data: torch.Tensor, k: int, max_iter: int = 1000, atol: float = 1e-4) -> torch.Tensor:
    r"""
    Initialize cluster centers by running the k-means algorithm on ``data``.

    Starting from a k-means++ initialization, k-means iteratively refines the centers by:

    1. **Assignment:**  
       $c_j = \arg\min_{i} \|x_j - \mu_i\|^2,$ for each data point $x_j$.

    2. **Update:**  
       $\mu_i = \frac{1}{\|C_i\|} \sum_{x_j \in C_i} x_j,$ where $C_i$ is the set of points assigned to center $\mu_i$.

    The algorithm stops when the centers move by less than the specified tolerance.

    Parameters
    ----------
    data : torch.Tensor
        A 2D tensor of shape (N, D) representing the dataset.
    k : int
        Number of cluster centers to generate.
    max_iter : int, optional
        Maximum number of iterations (default is 1000).
    atol : float, optional
        Convergence tolerance (default is $1\times10^{-4}$).

    Returns
    -------
    torch.Tensor
        A (k, D) tensor representing the final cluster centers.
    """
    centroids = GMMInitializer.kpp(data, k)

    for _ in range(max_iter):
        distances = torch.cdist(data, centroids)
        labels = torch.argmin(distances, dim=1)
        new_centroids = torch.stack([data[labels == i].mean(dim=0) for i in range(k)])
        if torch.allclose(centroids, new_centroids, atol=atol):
            break
        centroids = new_centroids

    return centroids

maxdist(data, k) staticmethod

A modified k-means++ initialization that maximizes the minimum distance between centers.

After randomly selecting the first center, each subsequent center is chosen as: $$ \mu_i = \arg\max_{x \in \mathcal{D}} \min_{l=1,\dots,i-1} |x - \mu_l|, $$ ensuring that the new center is as far as possible from the existing centers. Finally, the first center is reselected as: $$ \mu_1 = \arg\max_{x \in \mathcal{D}} \min_{l=2}^{k} |x - \mu_l|. $$

Parameters:

Name Type Description Default
data Tensor

A 2D tensor of shape (N, D).

required
k int

Number of cluster centers.

required

Returns:

Type Description
Tensor

A (k, D) tensor representing the initial cluster centers.

Source code in tgmm/gmm_init.py
@staticmethod
def maxdist(data: torch.Tensor, k: int) -> torch.Tensor:
    r"""
    A modified k-means++ initialization that maximizes the minimum distance
    between centers.

    After randomly selecting the first center, each subsequent center is chosen as:
    $$
    \mu_i = \arg\max_{x \in \mathcal{D}} \min_{l=1,\dots,i-1} \|x - \mu_l\|,
    $$
    ensuring that the new center is as far as possible from the existing centers.
    Finally, the first center is reselected as:
    $$
    \mu_1 = \arg\max_{x \in \mathcal{D}} \min_{l=2}^{k} \|x - \mu_l\|.
    $$

    Parameters
    ----------
    data : torch.Tensor
        A 2D tensor of shape (N, D).
    k : int
        Number of cluster centers.

    Returns
    -------
    torch.Tensor
        A (k, D) tensor representing the initial cluster centers.
    """
    n_samples, _ = data.shape
    centroids = torch.empty((k, data.size(1)), device=data.device)

    initial_idx = torch.randint(0, n_samples, (1,), device=data.device)
    centroids[0] = data[initial_idx]

    for i in range(1, k):
        dist_sq = torch.cdist(data, centroids[:i]).pow(2)
        min_dist = dist_sq.min(dim=1)[0]
        selected_idx = torch.argmax(min_dist)
        centroids[i] = data[selected_idx]

    dist_sq_to_first = torch.cdist(data, centroids[1:]).pow(2)
    min_dist_to_first = dist_sq_to_first.min(dim=1)[0]
    new_first_idx = torch.argmax(min_dist_to_first)
    centroids[0] = data[new_first_idx]

    return centroids

init_weights_uniform(n_components, device) staticmethod

Initialize weights uniformly (equal weights for all components).

Parameters:

Name Type Description Default
n_components int

Number of mixture components.

required
device device

Device to create the tensor on.

required

Returns:

Type Description
Tensor

Uniform weights of shape (n_components,).

Source code in tgmm/gmm_init.py
@staticmethod
def init_weights_uniform(n_components: int, device: torch.device) -> torch.Tensor:
    r"""
    Initialize weights uniformly (equal weights for all components).

    Parameters
    ----------
    n_components : int
        Number of mixture components.
    device : torch.device
        Device to create the tensor on.

    Returns
    -------
    torch.Tensor
        Uniform weights of shape (n_components,).
    """
    return torch.full(
        (n_components,),
        1.0 / n_components,
        dtype=torch.float32,
        device=device
    )

init_weights_random(n_components, device) staticmethod

Initialize weights randomly from a Dirichlet distribution.

Parameters:

Name Type Description Default
n_components int

Number of mixture components.

required
device device

Device to create the tensor on.

required

Returns:

Type Description
Tensor

Random weights of shape (n_components,) that sum to 1.

Source code in tgmm/gmm_init.py
@staticmethod
def init_weights_random(n_components: int, device: torch.device) -> torch.Tensor:
    r"""
    Initialize weights randomly from a Dirichlet distribution.

    Parameters
    ----------
    n_components : int
        Number of mixture components.
    device : torch.device
        Device to create the tensor on.

    Returns
    -------
    torch.Tensor
        Random weights of shape (n_components,) that sum to 1.
    """
    alpha = torch.ones(n_components, device=device)
    weights = torch.distributions.Dirichlet(alpha).sample()
    return weights.float()

init_weights_from_clusters(data, means) staticmethod

Initialize weights proportionally to cluster sizes based on k-means assignment.

Parameters:

Name Type Description Default
data Tensor

Input data of shape (n_samples, n_features).

required
means Tensor

Cluster means of shape (n_components, n_features).

required

Returns:

Type Description
Tensor

Weights of shape (n_components,) proportional to cluster sizes.

Source code in tgmm/gmm_init.py
@staticmethod
def init_weights_from_clusters(data: torch.Tensor, means: torch.Tensor) -> torch.Tensor:
    r"""
    Initialize weights proportionally to cluster sizes based on k-means assignment.

    Parameters
    ----------
    data : torch.Tensor
        Input data of shape (n_samples, n_features).
    means : torch.Tensor
        Cluster means of shape (n_components, n_features).

    Returns
    -------
    torch.Tensor
        Weights of shape (n_components,) proportional to cluster sizes.
    """
    if data.dim() == 1:
        data = data.unsqueeze(1)

    n_components = means.size(0)

    # Assign points to nearest mean
    distances = torch.cdist(data, means)
    labels = torch.argmin(distances, dim=1)

    # Count points per cluster
    counts = torch.bincount(labels, minlength=n_components).float()

    # Handle empty clusters
    counts = torch.clamp(counts, min=1e-10)

    # Normalize to get weights
    return counts / counts.sum()

init_covariances_eye(n_components, n_features, covariance_type, reg_covar, device) staticmethod

Initialize covariances as identity-like matrices/vectors.

Parameters:

Name Type Description Default
n_components int

Number of mixture components.

required
n_features int

Number of features.

required
covariance_type str

Type of covariance ('full', 'diag', 'spherical', 'tied_full', 'tied_diag', 'tied_spherical').

required
reg_covar float

Regularization added to diagonal.

required
device device

Device to create tensors on.

required

Returns:

Type Description
Tensor

Initialized covariances with appropriate shape.

Source code in tgmm/gmm_init.py
@staticmethod
def init_covariances_eye(n_components: int, n_features: int, 
                        covariance_type: str, reg_covar: float, 
                        device: torch.device) -> torch.Tensor:
    r"""
    Initialize covariances as identity-like matrices/vectors.

    Parameters
    ----------
    n_components : int
        Number of mixture components.
    n_features : int
        Number of features.
    covariance_type : str
        Type of covariance ('full', 'diag', 'spherical', 'tied_full', 'tied_diag', 'tied_spherical').
    reg_covar : float
        Regularization added to diagonal.
    device : torch.device
        Device to create tensors on.

    Returns
    -------
    torch.Tensor
        Initialized covariances with appropriate shape.
    """
    if covariance_type == 'full':
        out = []
        for _ in range(n_components):
            mat = torch.eye(n_features, device=device) * (1.0 + reg_covar)
            out.append(mat)
        return torch.stack(out, dim=0)
    elif covariance_type == 'diag':
        return torch.ones(n_components, n_features, device=device) * (1.0 + reg_covar)
    elif covariance_type == 'spherical':
        return torch.ones(n_components, device=device) * (1.0 + reg_covar)
    elif covariance_type == 'tied_full':
        return torch.eye(n_features, device=device) * (1.0 + reg_covar)
    elif covariance_type == 'tied_diag':
        return torch.ones(n_features, device=device) * (1.0 + reg_covar)
    elif covariance_type == 'tied_spherical':
        return torch.tensor(1.0 + reg_covar, device=device)
    else:
        raise ValueError(f"Unsupported covariance type: {covariance_type}")

init_covariances_random(n_components, n_features, covariance_type, reg_covar, device) staticmethod

Initialize covariances randomly as positive semi-definite matrices.

Parameters:

Name Type Description Default
n_components int

Number of mixture components.

required
n_features int

Number of features.

required
covariance_type str

Type of covariance.

required
reg_covar float

Regularization added to diagonal.

required
device device

Device to create tensors on.

required

Returns:

Type Description
Tensor

Random covariances with appropriate shape.

Source code in tgmm/gmm_init.py
@staticmethod
def init_covariances_random(n_components: int, n_features: int,
                           covariance_type: str, reg_covar: float,
                           device: torch.device) -> torch.Tensor:
    r"""
    Initialize covariances randomly as positive semi-definite matrices.

    Parameters
    ----------
    n_components : int
        Number of mixture components.
    n_features : int
        Number of features.
    covariance_type : str
        Type of covariance.
    reg_covar : float
        Regularization added to diagonal.
    device : torch.device
        Device to create tensors on.

    Returns
    -------
    torch.Tensor
        Random covariances with appropriate shape.
    """
    if covariance_type in ('full', 'tied_full'):
        def random_spd(dim):
            A = torch.randn(dim, dim, device=device)
            return A @ A.mT + reg_covar * torch.eye(dim, device=device)

        if covariance_type == 'full':
            return torch.stack([random_spd(n_features) for _ in range(n_components)], dim=0)
        else:
            return random_spd(n_features)
    elif covariance_type in ('diag', 'tied_diag'):
        shape = (n_components, n_features) if covariance_type == 'diag' else (n_features,)
        return torch.rand(shape, device=device) + reg_covar
    elif covariance_type in ('spherical', 'tied_spherical'):
        shape = (n_components,) if (covariance_type == 'spherical') else ()
        return torch.rand(shape, device=device) + (1.0 + reg_covar)
    else:
        raise ValueError(f"Unsupported covariance type: {covariance_type}")

init_covariances_global(data, n_components, covariance_type, reg_covar) staticmethod

Initialize covariances using global data covariance.

Parameters:

Name Type Description Default
data Tensor

Input data of shape (n_samples, n_features).

required
n_components int

Number of mixture components.

required
covariance_type str

Type of covariance.

required
reg_covar float

Regularization added to diagonal.

required

Returns:

Type Description
Tensor

Initialized covariances based on global data covariance.

Source code in tgmm/gmm_init.py
@staticmethod
def init_covariances_global(data: torch.Tensor, n_components: int,
                           covariance_type: str, reg_covar: float) -> torch.Tensor:
    r"""
    Initialize covariances using global data covariance.

    Parameters
    ----------
    data : torch.Tensor
        Input data of shape (n_samples, n_features).
    n_components : int
        Number of mixture components.
    covariance_type : str
        Type of covariance.
    reg_covar : float
        Regularization added to diagonal.

    Returns
    -------
    torch.Tensor
        Initialized covariances based on global data covariance.
    """
    if data.dim() == 1:
        data = data.unsqueeze(1)

    n_features = data.size(1)
    device = data.device

    # Compute global covariance matrix
    global_cov = torch.cov(data.T)
    global_cov += reg_covar * torch.eye(n_features, device=device)

    if covariance_type == 'full':
        return global_cov.unsqueeze(0).repeat(n_components, 1, 1)
    elif covariance_type == 'diag':
        diag_vals = torch.diagonal(global_cov)
        return diag_vals.unsqueeze(0).repeat(n_components, 1)
    elif covariance_type == 'spherical':
        avg_var = torch.mean(torch.diagonal(global_cov))
        return torch.full((n_components,), avg_var.item(), device=device)
    elif covariance_type == 'tied_full':
        return global_cov
    elif covariance_type == 'tied_diag':
        return torch.diagonal(global_cov)
    elif covariance_type == 'tied_spherical':
        return torch.mean(torch.diagonal(global_cov))
    else:
        raise ValueError(f"Unsupported covariance type: {covariance_type}")

init_covariances_empirical(data, means, covariance_type, reg_covar) staticmethod

Initialize covariances empirically from cluster assignments.

Assigns each data point to its nearest mean and computes cluster-wise empirical covariance matrices.

Parameters:

Name Type Description Default
data Tensor

Input data of shape (n_samples, n_features).

required
means Tensor

Cluster means of shape (n_components, n_features).

required
covariance_type str

Type of covariance.

required
reg_covar float

Regularization added to diagonal.

required

Returns:

Type Description
Tensor

Empirical covariances with appropriate shape.

Source code in tgmm/gmm_init.py
@staticmethod
def init_covariances_empirical(data: torch.Tensor, means: torch.Tensor,
                              covariance_type: str, reg_covar: float) -> torch.Tensor:
    r"""
    Initialize covariances empirically from cluster assignments.

    Assigns each data point to its nearest mean and computes cluster-wise
    empirical covariance matrices.

    Parameters
    ----------
    data : torch.Tensor
        Input data of shape (n_samples, n_features).
    means : torch.Tensor
        Cluster means of shape (n_components, n_features).
    covariance_type : str
        Type of covariance.
    reg_covar : float
        Regularization added to diagonal.

    Returns
    -------
    torch.Tensor
        Empirical covariances with appropriate shape.
    """
    if data.dim() == 1:
        data = data.unsqueeze(1)

    n_samples, n_features = data.shape
    n_components = means.size(0)
    device = data.device

    # Assign points to nearest mean
    distances = torch.cdist(data, means)
    labels = torch.argmin(distances, dim=1)

    if covariance_type == 'full':
        new_covs = []
        for k in range(n_components):
            cluster_mask = (labels == k)
            if not torch.any(cluster_mask):
                cov_k = torch.eye(n_features, device=device) * (1.0 + reg_covar)
            else:
                cluster_data = data[cluster_mask]
                cov_k = torch.cov(cluster_data.T)
                # Ensure cov_k is always a matrix (for n_features=1, torch.cov returns scalar)
                if cov_k.ndim == 0:
                    cov_k = cov_k.reshape(1, 1)
                cov_k += reg_covar * torch.eye(n_features, device=device)
            new_covs.append(cov_k)
        return torch.stack(new_covs, dim=0)

    elif covariance_type == 'diag':
        new_covs = []
        for k in range(n_components):
            cluster_mask = (labels == k)
            if not torch.any(cluster_mask):
                cov_k = torch.ones(n_features, device=device) * (1.0 + reg_covar)
            else:
                cluster_data = data[cluster_mask]
                cov_mat = torch.cov(cluster_data.T)
                # Ensure cov_mat is always a matrix (for n_features=1, torch.cov returns scalar)
                if cov_mat.ndim == 0:
                    cov_k = cov_mat.reshape(1) + reg_covar
                else:
                    cov_k = torch.diagonal(cov_mat) + reg_covar
            new_covs.append(cov_k)
        return torch.stack(new_covs, dim=0)

    elif covariance_type == 'spherical':
        new_covs = []
        for k in range(n_components):
            cluster_mask = (labels == k)
            if not torch.any(cluster_mask):
                cov_k = 1.0 + reg_covar
            else:
                cluster_data = data[cluster_mask]
                cov_mat = torch.cov(cluster_data.T)
                # For n_features=1, torch.cov returns a scalar
                if cov_mat.ndim == 0:
                    cov_k = max(cov_mat.item(), reg_covar)
                else:
                    cov_k = max(torch.mean(torch.diagonal(cov_mat)).item(), reg_covar)
            new_covs.append(torch.tensor(cov_k, device=device))
        return torch.stack(new_covs, dim=0)

    elif covariance_type == 'tied_full':
        sum_cov = torch.zeros(n_features, n_features, device=device)
        for k in range(n_components):
            cluster_mask = (labels == k)
            cluster_data = data[cluster_mask]
            if cluster_data.size(0) > 0:
                diff = cluster_data - cluster_data.mean(dim=0, keepdim=True)
                sum_cov += diff.T @ diff
        sum_cov /= n_samples
        sum_cov += reg_covar * torch.eye(n_features, device=device)
        return sum_cov

    elif covariance_type == 'tied_diag':
        sum_diag = torch.zeros(n_features, device=device)
        for k in range(n_components):
            cluster_mask = (labels == k)
            cluster_data = data[cluster_mask]
            if cluster_data.size(0) > 0:
                diff = cluster_data - cluster_data.mean(dim=0, keepdim=True)
                sum_diag += (diff * diff).sum(dim=0)
        sum_diag /= n_samples
        sum_diag += reg_covar
        return sum_diag

    elif covariance_type == 'tied_spherical':
        total_sum = 0.0
        for k in range(n_components):
            cluster_mask = (labels == k)
            cluster_data = data[cluster_mask]
            if cluster_data.size(0) > 0:
                diff = cluster_data - cluster_data.mean(dim=0, keepdim=True)
                total_sum += diff.pow(2).sum().item()
        var = max(total_sum / (n_samples * n_features), reg_covar)
        return torch.tensor(var, device=device)

    else:
        raise ValueError(f"Unsupported covariance type: {covariance_type}")