最近读聚类,有一段文章没读懂,求解释

asyuae 2012-09-18 07:42:42
决策树下聚类算法:
The basic idea is that we regard each data record (or point) in the
dataset to have a class Y. We then assume that the data space is
uniformly distributed with another type of points, called nonexisting
points. We give them the class, N. With the N points
added to the original data space, our problem of partitioning the
data space into data (dense) regions and empty (sparse) regions
becomes a classification problem. A decision tree algorithm can
be applied to solve the problem. However, for the technique to
work many important issues have to be addressed (see Section 2).
We now use an example to show the intuition behind the
proposed technique. Figure 1(A) gives a 2-dimensional space,
which has 24 data (Y) points. Two clusters exist in the space. We
then add some uniformly distributed N points (represented by “o”)
to the data space (Figure 1(B)). With the augmented dataset, we
can run a decision tree algorithm to obtain a partitioning of the
space (Figure 1(B)). The two clusters are identified.
The reason that this technique works is that if there are clusters in
the data, the data points cannot be uniformly distributed in the
entire space. By adding some uniformly distributed N points, we
can isolate the clusters because within each cluster region there
are more Y points than N points. The decision tree technique is
well known for this task.
...全文
80 7 打赏 收藏 转发到动态 举报
写回复
用AI写文章
7 条回复
切换为时间正序
请发表友善的回复…
发表回复
ri_aje 2012-09-18
  • 打赏
  • 举报
回复
[Quote=引用 6 楼 的回复:]

求帮忙
The basic idea is that we regard each data record (or point) in the
dataset to have a class Y. We then assume that the data space is
uniformly distributed with another type of points, called n……
[/Quote]
粗糙翻译了一下,凑活看吧。连猜带蒙的,没有上下文和图看不太明白,原文某些地方也模糊。

The basic idea is that we regard each data record (or point) in the
dataset to have a class Y.
认为数据集中的元素(点)具有类型 Y.

We then assume that the data space is
uniformly distributed with another type of points, called nonexisting
points. We give them the class, N.
假定整个数据空间中均匀分布着另一类型的点。他们具有类型 N.

With the N points
added to the original data space, our problem of partitioning the
data space into data (dense) regions and empty (sparse) regions
becomes a classification problem.
将类型 N 的点加入到数据空间中后,可将 划分数据空间为密集和稀疏区域的问题 转化为一个分类问题了。

A decision tree algorithm can
be applied to solve the problem.
决策树算法可用于解决此问题。
asyuae 2012-09-18
  • 打赏
  • 举报
回复
求帮忙
The basic idea is that we regard each data record (or point) in the
dataset to have a class Y. We then assume that the data space is
uniformly distributed with another type of points, called nonexisting
points. We give them the class, N. With the N points
added to the original data space, our problem of partitioning the
data space into data (dense) regions and empty (sparse) regions
becomes a classification problem. A decision tree algorithm can
be applied to solve the problem. However, for the technique to
work many important issues have to be addressed (see Section 2).
主要是这句
asyuae 2012-09-18
  • 打赏
  • 举报
回复
总体~什么意思???这一段不懂,下面没法进行~
ri_aje 2012-09-18
  • 打赏
  • 举报
回复
具体那句不明白?
asyuae 2012-09-18
  • 打赏
  • 举报
回复
[Quote=引用 2 楼 的回复:]

引用 1 楼 asyuae 的回复:
自己帮顶,顺便说一下英语不好是硬伤!

英语不好学习新东西、特别是国内资料少的东西,很吃力的。。
[/Quote]
嗯~确实,这个比较急,谁能帮帮忙~~~~~
Gloveing 2012-09-18
  • 打赏
  • 举报
回复
[Quote=引用 1 楼 asyuae 的回复:]
自己帮顶,顺便说一下英语不好是硬伤!
[/Quote]
英语不好学习新东西、特别是国内资料少的东西,很吃力的。。
asyuae 2012-09-18
  • 打赏
  • 举报
回复
自己帮顶,顺便说一下英语不好是硬伤!

64,647

社区成员

发帖
与我相关
我的任务
社区描述
C++ 语言相关问题讨论,技术干货分享,前沿动态等
c++ 技术论坛(原bbs)
社区管理员
  • C++ 语言社区
  • encoderlee
  • paschen
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
  1. 请不要发布与C++技术无关的贴子
  2. 请不要发布与技术无关的招聘、广告的帖子
  3. 请尽可能的描述清楚你的问题,如果涉及到代码请尽可能的格式化一下

试试用AI创作助手写篇文章吧