Octave で書かれた K-means プログラムの例
Octave で書かれた K-means プログラムの例を示す.K-means はクラスタリングのアルゴリズム. なぜ Octave で書いているかというと,学習用,またいろいろな距離関数 (metric function) を試したいという気持ちがあり, C 言語などにこだわらず(速いことにこだわらず),Octave で書かれたK-means プログラムにも価値があると考えたからです.
* 試しに動作させましたが,Octave でも,むちゃくちゃは遅くありません. 私の Linux マシンで 512×512 のカラー画像をクラスタ数 10 でクラスタリングしたところ 20秒弱.
* とはいえ,性能をとても重視する(ソースコードには興味ない)場合には,「k-means クラスタリングの例 (Open-CV を使用) 」の Web ページに,OpenCV を利用した K-means の Octave プログラムを載せていますので,そちらを見てください.
前準備 (Pre-requisite)
Octave のインストール (Install Octave)
The following code is the K-Means algorithm implementationwrittein in Octave
#
# The following code is the K-Means algorithm implementationwrittein in Octave
# Octave で書かれた K-means アルゴリズムの例
#
# The following code is the K-Means algorithm implementationwrittein in Octave
# Octave で書かれた K-means アルゴリズムの例
function [width, height] = mat_size(X)
height = size(X)(1);
width = size(X)(2);
endfunction
function Metrics = calc_metrics(Data, p)
# metrics between all data points in 'Data' and 'p'.
Diff = (Data - repmat( p, size(Data)(1), 1 ))';
# Euclidian distance
Metrics = sum(Diff .* Diff);
clear Diff
endfunction
function num = find_empty_cluster(Data, assignment, n_clusters)
for i = 1:n_clusters
# select data points in the i-th cluster,
Points = Data( find( assignment(:,2) == i ), : );
if ( size(Points)(1) == 0 )
num = i;
clear Points;
return;
endif
clear Points;
endfor
num = 0;
endfunction
function [NewCentroidPoints, assignment] = kmeans_step(Data, CentroidPoints, n_clusters)
# get the number of data points (= height) and the dimesnion (= width)
[width, height] = mat_size(Data);
# 'D' is a matrix to store the distance between data points and centroids
# データ点と centroids 間の距離を行列 D に格納する
D = zeros(height,n_clusters);
for i = 1:n_clusters
D(:,i) = calc_metrics( Data, CentroidPoints(i,:) );
endfor
# the shortes distance from each data point to any centroids
# 各データポイントから centroids までの最短距離
shortest = min(D')';
# get the centroid number for each data point
# 各データポイントについて,最も近い centroid の番号を得る
[m,n] = find( D == repmat(shortest,1,n_clusters) );
assignment = zeros(height,2);
assignment(:,1) = 1:height; # data point number
assignment(m,2) = n; # cluster number
done = 0;
while ( done == 0 )
# try to find an empty cluster
empty_cluster_num = find_empty_cluster(Data, assignment, n_clusters);
if ( empty_cluster_num > 0 )
# assign the farest data point to the empty cluster
data_point_num = find( shortest == max(shortest) );
assignment(data_point_num, empty_cluster_num);
# new, the farest data point becomes a centroid
shorttest(data_point_num) = 0;
else
done = 1;
endif
endwhile
clear shortest;
clear D;
# new centoid of each cluster
# 各クラスタの centroid を求める
for i = 1:n_clusters
# select data points in the i-th cluster,
Points = Data( find( assignment(:,2) == i ), : );
# compute the centroid of i-th cluster
if ( size(Points)(1) == 1 )
NewCentroidPoints(i,:) = Points;
else
NewCentroidPoints(i,:) = mean(Points);
endif
clear Points;
endfor
endfunction
#'
# Read Color Image Data
[rgb, immap, alpha] = imread("lena_std.jpg");
mono = rgb2gray( rgb );
colormap(gray(256));
# Initialize 3-dimensional data points
# Set color image data into 'Data'
A = double(rgb)/255;
Data = zeros( size(A)(1) * size(A)(2), size(A)(3) );
Data(:,1) = reshape( A(:,:,1), size(A)(1) * size(A)(2), 1 );
Data(:,2) = reshape( A(:,:,2), size(A)(1) * size(A)(2), 1 );
Data(:,3) = reshape( A(:,:,3), size(A)(1) * size(A)(2), 1 );
n_clusters = 10;
# get the number of data points (= height) and the dimesnion (= width)
[width, height] = mat_size(Data);
# clusters not yet constructed.
# select different 'n_clusters' data points number randomly to create initial 'centoroids'
# クラスタはまだできていない
# 'centroids' の初期値を得るために,'n_clusters' 個の異なるデータ点番号をランダムに選ぶ.
CentroidPoints = Data( randperm(height)(1:n_clusters), : );
[NewCentroidPoints, assignment] = kmeans_step(Data, CentroidPoints, n_clusters);
a = 1;
while( a > 0 )
CentroidPoints = NewCentroidPoints;
clear NewCentroidPoints;
old_assignment = assignment(:,2);
clear assignment;
[NewCentroidPoints, assignment] = kmeans_step(Data, CentroidPoints, n_clusters);
# assignment changed ?
# 'assignment' に変更があったか
a = max( abs( old_assignment - assignment(:,2) ) );
endwhile
V = NewCentroidPoints( assignment(:,2),: );
NEWIMAGE = zeros( size(A)(1), size(A)(2), size(A)(3) );
NEWIMAGE(:,:,1) = reshape( V(:,1), size(A)(1), size(A)(2) );
NEWIMAGE(:,:,2) = reshape( V(:,2), size(A)(1), size(A)(2) );
NEWIMAGE(:,:,3) = reshape( V(:,3), size(A)(1), size(A)(2) );
clear V;
imwrite(NEWIMAGE, "new.png");
disp "done"
Input Data Example
Output Data Example