最近读聚类，有一段文章没读懂，求解释

asyuae 2012-09-18 07:42:42

决策树下聚类算法：
The basic idea is that we regard each data record (or point) in the
dataset to have a class Y. We then assume that the data space is
uniformly distributed with another type of points, called nonexisting
points. We give them the class, N. With the N points
added to the original data space, our problem of partitioning the
data space into data (dense) regions and empty (sparse) regions
becomes a classification problem. A decision tree algorithm can
be applied to solve the problem. However, for the technique to
work many important issues have to be addressed (see Section 2).
We now use an example to show the intuition behind the
proposed technique. Figure 1(A) gives a 2-dimensional space,
which has 24 data (Y) points. Two clusters exist in the space. We
then add some uniformly distributed N points (represented by “o”)
to the data space (Figure 1(B)). With the augmented dataset, we
can run a decision tree algorithm to obtain a partitioning of the
space (Figure 1(B)). The two clusters are identified.
The reason that this technique works is that if there are clusters in
the data, the data points cannot be uniformly distributed in the
entire space. By adding some uniformly distributed N points, we
can isolate the clusters because within each cluster region there
are more Y points than N points. The decision tree technique is
well known for this task.

...全文

83 7 打赏收藏转发到动态举报

写回复

用AI写文章

7 条回复

切换为时间正序

请发表友善的回复…

发表回复

ri_aje 2012-09-18

打赏
举报

[Quote=引用 6 楼的回复:]

求帮忙
The basic idea is that we regard each data record (or point) in the
dataset to have a class Y. We then assume that the data space is
uniformly distributed with another type of points, called n……
[/Quote]
粗糙翻译了一下，凑活看吧。连猜带蒙的，没有上下文和图看不太明白，原文某些地方也模糊。

The basic idea is that we regard each data record (or point) in the
dataset to have a class Y.
认为数据集中的元素(点)具有类型 Y.

We then assume that the data space is
uniformly distributed with another type of points, called nonexisting
points. We give them the class, N.
假定整个数据空间中均匀分布着另一类型的点。他们具有类型 N.

With the N points
added to the original data space, our problem of partitioning the
data space into data (dense) regions and empty (sparse) regions
becomes a classification problem.
将类型 N 的点加入到数据空间中后，可将划分数据空间为密集和稀疏区域的问题转化为一个分类问题了。

A decision tree algorithm can
be applied to solve the problem.
决策树算法可用于解决此问题。

asyuae 2012-09-18

打赏
举报

求帮忙
The basic idea is that we regard each data record (or point) in the
dataset to have a class Y. We then assume that the data space is
uniformly distributed with another type of points, called nonexisting
points. We give them the class, N. With the N points
added to the original data space, our problem of partitioning the
data space into data (dense) regions and empty (sparse) regions
becomes a classification problem. A decision tree algorithm can
be applied to solve the problem. However, for the technique to
work many important issues have to be addressed (see Section 2).
主要是这句

asyuae 2012-09-18