PLC
直播中

王飞云

7年用户 1475经验值
私信 关注

Mahout 0.9中的聚类(Clustering)工具的用法

什么是Mahout?
怎样去使用Mahout 0.9中的聚类(Clustering)工具?

回帖(1)

李兆水

2021-9-23 10:13:22
什么是Mahout?
” Apache Mahout™ project’s goal is to build a scalable machine learning library ”
我来拓展一下:
(1) Mahout 是Apache旗下的开源项目,集成了大量的机器学习算法。
(2) 大部分算法,可以运行在Hadoop上,具有很好的拓展性,使得大数据上的机器学习成为可能。

本篇主要探讨 Mahout 0.9 中的聚类(Clustering)工具的用法。
一、数据准备
Mahout聚类算法的输入为List,即需要将每个待聚类的文档,表示为向量形式。
在本文中,我们选择经典的 Reuters21578 文本语料。尝试对新闻内容进行文本聚类。
1、下载数据
[color=#333333 !important]
1
[color=#002D7A !important]axel [color=#006FE0 !important][color=#006FE0 !important]- n [color=#006FE0 !important][color=#CE0000 !important]20 [color=#006FE0 !important][color=#002D7A !important]http [color=#006FE0 !important]: [color=#FF8000 !important]//kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz
2、解压缩数据
[color=#333333 !important]
1
[color=#002D7A !important]tar [color=#006FE0 !important][color=#006FE0 !important]- xzvf [color=#006FE0 !important][color=#333333 !important]. [color=#006FE0 !important]/ [color=#002D7A !important]reuters21578 [color=#333333 !important]. [color=#002D7A !important]tar [color=#333333 !important]. gz [color=#006FE0 !important][color=#333333 !important]. [color=#006FE0 !important]/ [color=#002D7A !important]reuters [color=#006FE0 !important]- [color=#002D7A !important]sgm
解压缩之后,reuters-sgm下,包含了若干*.sgm文件,每个文件中又包含了若干下属结构化文档:
[color=#333333 !important]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[color=#006FE0 !important]< [color=#004ED0 !important]REUTERS [color=#002D7A !important]TOPICS [color=#006FE0 !important]= [color=#008000 !important]"NO" [color=#006FE0 !important][color=#002D7A !important]LEWISSPLIT [color=#006FE0 !important]= [color=#008000 !important]"TRAIN" [color=#006FE0 !important][color=#002D7A !important]CGISPLIT [color=#006FE0 !important]= [color=#008000 !important]"TRAINING-SET" [color=#006FE0 !important][color=#002D7A !important]OLDID [color=#006FE0 !important]= [color=#008000 !important]"5545" [color=#006FE0 !important][color=#002D7A !important]NEWID [color=#006FE0 !important]= [color=#008000 !important]"2" [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]DATE [color=#006FE0 !important]> [color=#CE0000 !important]26 [color=#006FE0 !important]- [color=#002D7A !important]FEB [color=#006FE0 !important]- [color=#CE0000 !important]1987 [color=#006FE0 !important][color=#CE0000 !important]15 [color=#006FE0 !important]: [color=#CE0000 !important]02 [color=#006FE0 !important]: [color=#CE0000 !important]20.00 [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]DATE [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]TOPICS [color=#006FE0 !important]> [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]TOPICS [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]PLACES [color=#006FE0 !important]> [color=#006FE0 !important]< [color=#002D7A !important]D [color=#006FE0 !important]> [color=#002D7A !important]usa [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]D [color=#006FE0 !important]> [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]PLACES [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]PEOPLE [color=#006FE0 !important]> [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]PEOPLE [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]ORGS [color=#006FE0 !important]> [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]ORGS [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]EXCHANGES [color=#006FE0 !important]> [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]EXCHANGES [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]COMPANIES [color=#006FE0 !important]> [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]COMPANIES [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]UNKNOWN [color=#006FE0 !important]>
F [color=#006FE0 !important]Y
f0708 reute
d [color=#006FE0 !important]f [color=#006FE0 !important][color=#002D7A !important]BC [color=#006FE0 !important]- [color=#002D7A !important]STANDARD [color=#006FE0 !important]- [color=#002D7A !important]OIL [color=#006FE0 !important]- [color=#006FE0 !important]& [color=#002D7A !important]lt [color=#333333 !important]; [color=#002D7A !important]SRD [color=#006FE0 !important]> [color=#006FE0 !important]- [color=#800080 !important]TO [color=#006FE0 !important]   [color=#CE0000 !important]02 [color=#006FE0 !important]- [color=#CE0000 !important]26 [color=#006FE0 !important][color=#CE0000 !important]0082 [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]UNKNOWN [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]TEXT [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]TITLE [color=#006FE0 !important]> [color=#004ED0 !important]STANDARD [color=#002D7A !important]OIL [color=#006FE0 !important][color=#006FE0 !important]& [color=#002D7A !important]lt [color=#333333 !important]; [color=#002D7A !important]SRD [color=#006FE0 !important]> [color=#006FE0 !important][color=#800080 !important]TO [color=#006FE0 !important][color=#004ED0 !important]FORM [color=#004ED0 !important]FINANCIAL [color=#002D7A !important]UNIT [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]TITLE [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#002D7A !important]DATELINE [color=#006FE0 !important]> [color=#006FE0 !important]    [color=#002D7A !important]CLEVELAND [color=#333333 !important], [color=#006FE0 !important]Feb [color=#006FE0 !important][color=#CE0000 !important]26 [color=#006FE0 !important][color=#006FE0 !important]- [color=#006FE0 !important][color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]DATELINE [color=#006FE0 !important]> [color=#006FE0 !important]< [color=#002D7A !important]BODY [color=#006FE0 !important]> [color=#004ED0 !important]Standard [color=#004ED0 !important]Oil [color=#004ED0 !important]Co [color=#800080 !important]and [color=#006FE0 !important][color=#004ED0 !important]BP [color=#004ED0 !important]North [color=#004ED0 !important]America
[color=#004ED0 !important]Inc [color=#004ED0 !important]said [color=#004ED0 !important]they [color=#004ED0 !important]plan [color=#800080 !important]to [color=#006FE0 !important]form [color=#006FE0 !important]a [color=#006FE0 !important][color=#004ED0 !important]venture [color=#800080 !important]to [color=#006FE0 !important][color=#004ED0 !important]manage [color=#004ED0 !important]the [color=#004ED0 !important]money [color=#004ED0 !important]market
[color=#004ED0 !important]borrowing [color=#800080 !important]and [color=#006FE0 !important][color=#004ED0 !important]investment [color=#004ED0 !important]activities [color=#004ED0 !important]of [color=#004ED0 !important]both [color=#002D7A !important]companies [color=#333333 !important].
[color=#006FE0 !important]    [color=#004ED0 !important]BP [color=#004ED0 !important]North [color=#004ED0 !important]America [color=#800080 !important]is [color=#006FE0 !important]a [color=#006FE0 !important][color=#004ED0 !important]subsidiary [color=#004ED0 !important]of [color=#004ED0 !important]British [color=#004ED0 !important]Petroleum [color=#004ED0 !important]Co
[color=#002D7A !important]Plc [color=#006FE0 !important][color=#006FE0 !important]& [color=#002D7A !important]lt [color=#333333 !important]; [color=#002D7A !important]BP [color=#006FE0 !important]> [color=#333333 !important], [color=#006FE0 !important][color=#004ED0 !important]which [color=#004ED0 !important]also owns [color=#006FE0 !important]a [color=#006FE0 !important][color=#CE0000 !important]55 [color=#006FE0 !important][color=#004ED0 !important]pct [color=#004ED0 !important]interest [color=#800080 !important]in [color=#006FE0 !important][color=#004ED0 !important]Standard [color=#002D7A !important]Oil [color=#333333 !important].
[color=#006FE0 !important]    [color=#004ED0 !important]The [color=#004ED0 !important]venture [color=#004ED0 !important]will [color=#004ED0 !important]be [color=#004ED0 !important]called [color=#002D7A !important]BP [color=#006FE0 !important]/ [color=#004ED0 !important]Standard [color=#004ED0 !important]Financial [color=#004ED0 !important]Trading
[color=#800080 !important]and [color=#006FE0 !important][color=#004ED0 !important]will [color=#004ED0 !important]be [color=#004ED0 !important]operated [color=#004ED0 !important]by [color=#004ED0 !important]Standard [color=#004ED0 !important]Oil [color=#004ED0 !important]under [color=#004ED0 !important]the [color=#004ED0 !important]oversight of [color=#006FE0 !important]a
[color=#004ED0 !important]joint [color=#004ED0 !important]management [color=#002D7A !important]committee [color=#333333 !important].
[color=#006FE0 !important]Reuter
[color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]BODY [color=#006FE0 !important]> [color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]TEXT [color=#006FE0 !important]>
[color=#006FE0 !important]< [color=#006FE0 !important]/ [color=#002D7A !important]REUTERS [color=#006FE0 !important]>
在下文中,我们主要使用和<BODY>中的文本。即标题+正文。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">3、抽取</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">Mahout中内置了对上述Reuters预料的抽取程序,我们可以直接使用。</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">mahout</font></font> [color=#002D7A !important]<font face="inherit">org</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">apache</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">lucene</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">benchmark</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">utils</font></font> [color=#333333 !important]<font face="inherit">.</font></font> <font face="inherit">ExtractReuters</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">sgm</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">out</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">如上所述,抽取好的结果在./reuters-out文件夹下面,每篇<REUTERS>文档,变成了一个独立的文件。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">一共有21578个txt,即数据集中含有21578篇文档:-)</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">说下命名规则吧,例如:文件名:./reuters-out/reut2-006.sgm-246.txt,表示来自于./reuters-sgm/reut2-006.sgm中的第246篇文档,下标从0开始。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">4、转换成SequenceFile</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">对于传统的文本聚类算法而言,下一步应该是:将文本转化为词的向量空间表示。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">然而,不要太着急哦。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">由于Mahout运行在Hadoop上,HDFS是为大文件设计的。如果我们把上述21578个txt都拷贝上去,这样是非常不合适的</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">设想下:假设对1000万篇新闻进行聚类,难道要拷贝1000w个文件么?这会把name node搞挂的。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">因此,Mahout采用SequenceFile作为其基本的数据交换格式。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">内置的seqdirectory命令(这个命令设计的不合理,应该叫directoryseq才对),可以完成 文本目录->SequenceFile的转换过程。</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">mahout </font></font>[color=#002D7A !important]<font face="inherit">seqdirectory</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">i</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">file</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#FF8000 !important]<font face="inherit">//$(pwd)/reuters-out/ -o file://$(pwd)/reuters-seq/ -c UTF-8 -chunk 64 -xm sequential</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">上述命令蕴含了2个大坑,在其他文档中均没有仔细说明:<br /> (1) -xm sequential,表示在本地执行,而不是用MapReduce执行。如果是后者,我们势必要将这些小文件上传到HDFS上,那样的话,还要SequenceFile做甚……<br /> (2) 然而seqdirectory在执行的时候,并不因为十本地模式,就在本地文件系统上寻找。而是根据-i -o的文件系统前缀来判断文件位置。也就是说,默认情况,依然十在HDFS上查找的……所以,这个file://的前缀是非常有必要的。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其他2个参数:</font></font></font><br /> <ul><br /> <li>-c UTF8:编码。<br /> <li>-chunk 64:64MB一个Chunk,应该和HDFS的BLOCK保持一致或者倍数关系。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">5、转换为向量表示</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">为了适应多种数据,聚类算法多使用向量空间作为输入数据。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">由于我们先前已经得到了处理好的SequenceFile,从这一步开始,就可以在Hadoop上进行啦。</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">hadoop</font></font> [color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">put</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">seq</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">开始text->Vector的转换:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">mahout </font></font>[color=#002D7A !important]<font face="inherit">seq2sparse</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">i</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">seq</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">o</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">ow</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">--</font></font> [color=#004ED0 !important]<font face="inherit">weight </font></font>[color=#002D7A !important]<font face="inherit">tfidf</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">--</font></font> <font face="inherit">maxDFPercent</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">85</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">--</font></font> [color=#002D7A !important]<font face="inherit">namedVector</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">输入和输出不解释了。在Mahout中的向量类型可以称为sparse。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">参数说明如下:</font></font></font><br /> <ul><br /> <li>-ow( 或 –overwrite):即使输出目录存在,依然覆盖。<br /> <li>–weight(或 -wt) tfidf:权重公式,大家都懂的。其他可选的有tf (当LDA时建议使用)。<br /> <li>–maxDFPercent(或 -x) 85:过滤高频词,当DF大于85%时,将不在作为词特征输出到向量中。<br /> <li>–namedVector (或-nv):向量会输出附加信息。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其他可能有用的选项:</font></font></font><br /> <ul><br /> <li>–analyzerName(或-a):指定其他分词器。<br /> <li>–minDF:最小DF阈值。<br /> <li>–minSupport:最小的支持度阈值,默认为2。<br /> <li>–maxNGramSize(或-ng):是否创建ngram,默认为1。建议一般设定到2就够了。<br /> <li>–minLLR(或 -ml):The minimum Log Likelihood Ratio。默认为1.0。当设定了-ng > 1后,建议设置为较大的值,只过滤有意义的N-Gram。<br /> <li>–logNormalize(或 -lnorm):是否对输出向量做Log变换。<br /> <li>–norm(或 -n):是否对输出向量做p-norm变换,默认不变换。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">看一下产出:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> 4<br /> 5<br /> 6<br /> 7<br /> 8<br /> 9<br /> </td><td>[color=#004ED0 !important]<font face="inherit">hadoop</font></font> [color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">ls</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">sparse</font></font> <br /> <font face="inherit">Found</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">7</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">items</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">df</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">count</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">dictionary</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">file</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">0</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">frequency</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">file</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">0</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">tf</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">vectors</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">tfidf</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">vectors</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">tokenized</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">documents</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">wordcount</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">说明各个文件的用途:</font></font></font><br /> <ul><br /> <li>dictionary.file-0:词文本 -> 词id(int)的映射。词转化为id,这是常见做法。<br /> <li>frequency.file:词id -> 文档集词频(cf)。<br /> <li>wordcount(目录): 词文本 -> 文档集词频(cf),这个应该是各种过滤处理之前的信息。<br /> <li>df-count(目录): 词id -> 文档频率(df)。<br /> <li>tf-vectors、tfidf-vectors (均为目录):词向量,每篇文档一行,格式为{词id:特征值},其中特征值为tf或tfidf。有用采用了内置类型VectorWritable,需要用命令”mahout vectordump -i <path>”查看。<br /> <li>tokenized-documents:分词后的文档。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">二、KMeans</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">1、运行K-Means</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">mahout </font></font>[color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">i</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">tfidf</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">vectors</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">c</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">o</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">k</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">20</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">dm </font></font>[color=#002D7A !important]<font face="inherit">org</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">apache</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">mahout</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">common</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">distance</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">CosineDistanceMeasure</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">x</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">200</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">ow</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">--</font></font> [color=#002D7A !important]<font face="inherit">clustering</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">参数说明如下:</font></font></font><br /> <ul><br /> <li>-i:输入为上面产出的tfidf向量。<br /> <li>-o:每一轮迭代的结果将输出在这里。<br /> <li>-k:几个簇。<br /> <li>-c:这是一个神奇的变量。若不设定k,则用这个目录里面的点,作为聚类中心点。否则,随机选择k个点,作为中心点。<br /> <li>-dm:距离公式,文本类型推荐用cosine距离。<br /> <li>-x :最大迭代次数。<br /> <li>–clustering:在mapreduce模式运行。<br /> <li>–convergenceDelta:迭代收敛阈值,默认0.5,对于Cosine来说略大。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">输出1,初始随机选择的中心点:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> </td><td>[color=#004ED0 !important]<font face="inherit">hadoop</font></font> [color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">ls</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">clusters</font></font> <br /> <font face="inherit">Found</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">1</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">items</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">part</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">randomSeed</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">输出2,聚类过程、结果:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> 4<br /> 5<br /> 6<br /> 7<br /> </td><td>[color=#004ED0 !important]<font face="inherit">hadoop </font></font>[color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">ls</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">kmeans</font></font> <br /> <font face="inherit">Found</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">5</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">items</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">_policy</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusteredPoints</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">0</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">1</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">2</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#800080 !important]<font face="inherit">final</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其中,clusters-k(-final)为每次迭代后,簇的20个中心点的信息。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">而clusterdPoints,存储了 簇id -> 文档id 的映射。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">2、查看簇结果</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">首先,用clusterdump,来查看k(20)个簇的信息。</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> 4<br /> 5<br /> </td><td>[color=#B85C00 !important]<font face="inherit"># Get to Local</font></font> <br /> [color=#004ED0 !important]<font face="inherit">hadoop</font></font> [color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">get</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> <br /> [color=#004ED0 !important]<font face="inherit">hadoop</font></font> [color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">get</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> <br /> [color=#B85C00 !important]<font face="inherit"># View ..</font></font> <br /> [color=#004ED0 !important]<font face="inherit">mahout</font></font> [color=#002D7A !important]<font face="inherit">clusterdump</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">i</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">2</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#800080 !important]<font face="inherit">final</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">d</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">dictionary</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">file</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">0</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">dt</font></font> [color=#002D7A !important]<font face="inherit">sequencefile</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">o</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">cluster</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">dump</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">n</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">20</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">要说明的是,clusterdump似乎只能在本地执行……所以先把数据下载到本地吧。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">参数说明:</font></font></font><br /> <ul><br /> <li>-i :我们只看最终迭代生成的簇结果。<br /> <li>-d :使用 词 -> 词id 映射,使得我们输出结果中,可以直接显示每个簇,权重最高的词文本,而不是词id。<br /> <li>-dt:上面映射类型,由于我们是seqdictionary生成的,so。。<br /> <li>-o:最终产出目录<br /> <li>-n:每个簇,只输出20个权重最高的词。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">看看dump结果吧:</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">一共有20行,表示20个簇。每行形如:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> 4<br /> 5<br /> 6<br /> 7<br /> 8<br /> 9<br /> 10<br /> 11<br /> 12<br /> 13<br /> 14<br /> 15<br /> 16<br /> 17<br /> 18<br /> 19<br /> 20<br /> 21<br /> 22<br /> 23<br /> </td><td>[color=#004ED0 !important]<font face="inherit">VL</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">12722</font></font> [color=#333333 !important]<font face="inherit">{</font></font> [color=#002D7A !important]<font face="inherit">n</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#CE0000 !important]<font face="inherit">1305</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">c</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#333333 !important]<font face="inherit">[</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> <font face="inherit">zorinsky</font>' [color=#002D7A !important]<font face="inherit">s</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">0.011</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">zurich</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">0.006...</font></font> [color=#333333 !important]<font face="inherit">]</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">r</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#333333 !important]<font face="inherit">[</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">yuan</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">1.055</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">yugoslav</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">1.027</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">]</font></font> [color=#333333 !important]<font face="inherit">}</font></font> <br /> [color=#006FE0 !important]<font face="inherit">        </font></font> [color=#004ED0 !important]<font face="inherit">Top </font></font>[color=#002D7A !important]<font face="inherit">Terms</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">he</font></font> [color=#006FE0 !important]<font face="inherit">                                      </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">3.105303428364896</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">said</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">2.8756448350190205</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">would</font></font> [color=#006FE0 !important]<font face="inherit">                                   </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">2.6413800148214874</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">have</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">2.1552908992401942</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">government</font></font> [color=#006FE0 !important]<font face="inherit">                              </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.8426488105364687</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">which</font></font> [color=#006FE0 !important]<font face="inherit">                                   </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">1.749669294978467</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">economic</font></font> [color=#006FE0 !important]<font face="inherit">                                </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.7431561736768233</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">has</font></font> [color=#006FE0 !important]<font face="inherit">                                     </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.7429241635333532</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">prices</font></font> [color=#006FE0 !important]<font face="inherit">                                  </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.7182022383386604</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">oil</font></font> [color=#006FE0 !important]<font face="inherit">                                     </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">1.673632335845538</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">from</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">    </font></font> [color=#CE0000 !important]<font face="inherit">1.64287882106971</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">u</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">s</font></font> [color=#006FE0 !important]<font face="inherit">                                     </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.6223870217115028</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">had</font></font> [color=#006FE0 !important]<font face="inherit">                                     </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">1.602064758607711</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">more</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.5874425666999086</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">last</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">1.561653600890061</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">we</font></font> [color=#006FE0 !important]<font face="inherit">                                      </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.5274837373316974</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">been</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.4653439554674872</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">year</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.4279387724353894</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">could</font></font> [color=#006FE0 !important]<font face="inherit">                                   </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.4152588548331426</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">minister</font></font> [color=#006FE0 !important]<font face="inherit">                                </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.4146991936183066</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其中前面的12722是簇的ID,n=1305即簇中有这么多个文档。c向量是簇中心点向量,格式为 词文本:权重(点坐标),r是簇的半径向量,格式为 词文本:半径。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">下面的Top Terms是簇中选取出来的特征词。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">3、查看聚类结果</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其实,聚类结果中,更重要的是,文档被聚到了哪个类。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">遗憾的是,在很多资料中,都没有说明这一点。前文我们已经提到了,簇id -> 文档id的结果,保存在了clusteredPoints下面。这也是mahout内置类型存储的。我们可以用seqdumper命令查看。</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">mahout</font></font> [color=#002D7A !important]<font face="inherit">seqdumper</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">i</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusteredPoints</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其中,-d和-dt的原因同clusterdump。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">如果不指定-o,默认输出到屏幕,输出结果为形如:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#002D7A !important]<font face="inherit">Key</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">4255</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">Value</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">wt</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">1.0</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">distance</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">0.7752480913348985</font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#002D7A !important]<font face="inherit">vec</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reut2</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">000.sgm</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">0.txt</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">[</font></font> [color=#CE0000 !important]<font face="inherit">14</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">4.670</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">35</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">7.545</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">11278</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">6.394</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">11288</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">6.731</font></font> [color=#333333 !important]<font face="inherit">]</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其实,这个输出是一个SequenceFile,大家自己写程序也可以读出来的。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">Key是ClusterID,上面clusterdump的时候,已经说了。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">Value是文档的聚类结果:wt是文档属于簇的概率,对于kmeans总是1.0,/reut2-000.sgm-0.txt就是文档标志啦,前面seqdirectionary的-nv起作用了,再后面的就是这个点的各个词id和权重了。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">三、Fuzzy-KMeans</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">KMeans是一种简单有效的聚类方法,但存在一些缺点。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">例如:一个点只能属于一个簇,这种叫做硬聚类。而很多情况下,软聚类才是科学的。例如:《哈利波》属于小说,也属于电影。Fuzzy-Kmeans 通过引入“隶属度”的方式,实现了软聚类。</font></font></font><div class="message_content_answer_8561144" style="display: none;"> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">什么是Mahout?</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">” Apache Mahout™ project’s goal is to build a scalable machine learning library ”</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">我来拓展一下:<br /> (1) Mahout 是Apache旗下的开源项目,集成了大量的机器学习算法。<br /> (2) 大部分算法,可以运行在Hadoop上,具有很好的拓展性,使得大数据上的机器学习成为可能。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">本篇主要探讨 Mahout 0.9 中的聚类(Clustering)工具的用法。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">一、数据准备</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">Mahout聚类算法的输入为List<Vector>,即需要将每个待聚类的文档,表示为向量形式。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">在本文中,我们选择经典的 Reuters21578 文本语料。尝试对新闻内容进行文本聚类。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">1、下载数据</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#002D7A !important]<font face="inherit">axel</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">n</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">20</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">http</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#FF8000 !important]<font face="inherit">//kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">2、解压缩数据</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#002D7A !important]<font face="inherit">tar</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">xzvf</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters21578</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">tar</font></font> [color=#333333 !important]<font face="inherit">.</font></font> <font face="inherit">gz</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sgm</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">解压缩之后,reuters-sgm下,包含了若干*.sgm文件,每个文件中又包含了若干下属结构化文档:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> 4<br /> 5<br /> 6<br /> 7<br /> 8<br /> 9<br /> 10<br /> 11<br /> 12<br /> 13<br /> 14<br /> 15<br /> 16<br /> 17<br /> 18<br /> 19<br /> 20<br /> 21<br /> 22<br /> 23<br /> 24<br /> 25<br /> 26<br /> </td><td>[color=#006FE0 !important]<font face="inherit"><</font></font> [color=#004ED0 !important]<font face="inherit">REUTERS </font></font>[color=#002D7A !important]<font face="inherit">TOPICS</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#008000 !important]<font face="inherit">"NO"</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">LEWISSPLIT</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#008000 !important]<font face="inherit">"TRAIN"</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">CGISPLIT</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#008000 !important]<font face="inherit">"TRAINING-SET"</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">OLDID</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#008000 !important]<font face="inherit">"5545"</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">NEWID</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#008000 !important]<font face="inherit">"2"</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">DATE</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#CE0000 !important]<font face="inherit">26</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">FEB</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">1987</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">15</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">02</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">20.00</font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">DATE</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">TOPICS</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">TOPICS</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">PLACES</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">D</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#002D7A !important]<font face="inherit">usa</font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">D</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">PLACES</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">PEOPLE</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">PEOPLE</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">ORGS</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">ORGS</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">EXCHANGES</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">EXCHANGES</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">COMPANIES</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">COMPANIES</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">UNKNOWN</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> <font face="inherit">F</font> [color=#006FE0 !important]<font face="inherit"></font></font><font face="inherit">Y</font> <br /> <font face="inherit">f0708</font> <font face="inherit">reute</font> <br /> <font face="inherit">d</font> [color=#006FE0 !important]<font face="inherit"></font></font><font face="inherit">f</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">BC</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">STANDARD</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">OIL</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#006FE0 !important]<font face="inherit">&</font></font> [color=#002D7A !important]<font face="inherit">lt</font></font> [color=#333333 !important]<font face="inherit">;</font></font> [color=#002D7A !important]<font face="inherit">SRD</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#800080 !important]<font face="inherit">TO</font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">02</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">26</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">0082</font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">UNKNOWN</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">TEXT</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">TITLE</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#004ED0 !important]<font face="inherit">STANDARD </font></font>[color=#002D7A !important]<font face="inherit">OIL</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">&</font></font> [color=#002D7A !important]<font face="inherit">lt</font></font> [color=#333333 !important]<font face="inherit">;</font></font> [color=#002D7A !important]<font face="inherit">SRD</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#800080 !important]<font face="inherit">TO</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#004ED0 !important]<font face="inherit">FORM </font></font>[color=#004ED0 !important]<font face="inherit">FINANCIAL </font></font>[color=#002D7A !important]<font face="inherit">UNIT</font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">TITLE</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">DATELINE</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">    </font></font> [color=#002D7A !important]<font face="inherit">CLEVELAND</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font><font face="inherit">Feb</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">26</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">DATELINE</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#002D7A !important]<font face="inherit">BODY</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#004ED0 !important]<font face="inherit">Standard </font></font>[color=#004ED0 !important]<font face="inherit">Oil </font></font>[color=#004ED0 !important]<font face="inherit">Co </font></font>[color=#800080 !important]<font face="inherit">and</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#004ED0 !important]<font face="inherit">BP </font></font>[color=#004ED0 !important]<font face="inherit">North </font></font>[color=#004ED0 !important]<font face="inherit">America</font></font> <br /> [color=#004ED0 !important]<font face="inherit">Inc </font></font>[color=#004ED0 !important]<font face="inherit">said </font></font>[color=#004ED0 !important]<font face="inherit">they </font></font>[color=#004ED0 !important]<font face="inherit">plan </font></font>[color=#800080 !important]<font face="inherit">to</font></font> [color=#006FE0 !important]<font face="inherit"></font></font><font face="inherit">form</font> [color=#006FE0 !important]<font face="inherit"></font></font><font face="inherit">a</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#004ED0 !important]<font face="inherit">venture </font></font>[color=#800080 !important]<font face="inherit">to</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#004ED0 !important]<font face="inherit">manage </font></font>[color=#004ED0 !important]<font face="inherit">the </font></font>[color=#004ED0 !important]<font face="inherit">money </font></font>[color=#004ED0 !important]<font face="inherit">market</font></font> <br /> [color=#004ED0 !important]<font face="inherit">borrowing </font></font>[color=#800080 !important]<font face="inherit">and</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#004ED0 !important]<font face="inherit">investment </font></font>[color=#004ED0 !important]<font face="inherit">activities </font></font>[color=#004ED0 !important]<font face="inherit">of </font></font>[color=#004ED0 !important]<font face="inherit">both </font></font>[color=#002D7A !important]<font face="inherit">companies</font></font> [color=#333333 !important]<font face="inherit">.</font></font> <br /> [color=#006FE0 !important]<font face="inherit">    </font></font> [color=#004ED0 !important]<font face="inherit">BP </font></font>[color=#004ED0 !important]<font face="inherit">North </font></font>[color=#004ED0 !important]<font face="inherit">America </font></font>[color=#800080 !important]<font face="inherit">is</font></font> [color=#006FE0 !important]<font face="inherit"></font></font><font face="inherit">a</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#004ED0 !important]<font face="inherit">subsidiary </font></font>[color=#004ED0 !important]<font face="inherit">of </font></font>[color=#004ED0 !important]<font face="inherit">British </font></font>[color=#004ED0 !important]<font face="inherit">Petroleum </font></font>[color=#004ED0 !important]<font face="inherit">Co</font></font> <br /> [color=#002D7A !important]<font face="inherit">Plc</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">&</font></font> [color=#002D7A !important]<font face="inherit">lt</font></font> [color=#333333 !important]<font face="inherit">;</font></font> [color=#002D7A !important]<font face="inherit">BP</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#004ED0 !important]<font face="inherit">which </font></font>[color=#004ED0 !important]<font face="inherit">also </font></font><font face="inherit">owns</font> [color=#006FE0 !important]<font face="inherit"></font></font><font face="inherit">a</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">55</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#004ED0 !important]<font face="inherit">pct </font></font>[color=#004ED0 !important]<font face="inherit">interest </font></font>[color=#800080 !important]<font face="inherit">in</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#004ED0 !important]<font face="inherit">Standard </font></font>[color=#002D7A !important]<font face="inherit">Oil</font></font> [color=#333333 !important]<font face="inherit">.</font></font> <br /> [color=#006FE0 !important]<font face="inherit">    </font></font> [color=#004ED0 !important]<font face="inherit">The </font></font>[color=#004ED0 !important]<font face="inherit">venture </font></font>[color=#004ED0 !important]<font face="inherit">will </font></font>[color=#004ED0 !important]<font face="inherit">be </font></font>[color=#004ED0 !important]<font face="inherit">called </font></font>[color=#002D7A !important]<font face="inherit">BP</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#004ED0 !important]<font face="inherit">Standard </font></font>[color=#004ED0 !important]<font face="inherit">Financial </font></font>[color=#004ED0 !important]<font face="inherit">Trading</font></font> <br /> [color=#800080 !important]<font face="inherit">and</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#004ED0 !important]<font face="inherit">will </font></font>[color=#004ED0 !important]<font face="inherit">be </font></font>[color=#004ED0 !important]<font face="inherit">operated </font></font>[color=#004ED0 !important]<font face="inherit">by </font></font>[color=#004ED0 !important]<font face="inherit">Standard </font></font>[color=#004ED0 !important]<font face="inherit">Oil </font></font>[color=#004ED0 !important]<font face="inherit">under </font></font>[color=#004ED0 !important]<font face="inherit">the </font></font>[color=#004ED0 !important]<font face="inherit">oversight </font></font><font face="inherit">of</font> [color=#006FE0 !important]<font face="inherit"></font></font><font face="inherit">a</font> <br /> [color=#004ED0 !important]<font face="inherit">joint </font></font>[color=#004ED0 !important]<font face="inherit">management </font></font>[color=#002D7A !important]<font face="inherit">committee</font></font> [color=#333333 !important]<font face="inherit">.</font></font> <br /> [color=#006FE0 !important]<font face="inherit"></font></font><font face="inherit">Reuter</font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">BODY</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">TEXT</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> [color=#006FE0 !important]<font face="inherit"><</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">REUTERS</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">在下文中,我们主要使用<TITLE>和<BODY>中的文本。即标题+正文。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">3、抽取</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">Mahout中内置了对上述Reuters预料的抽取程序,我们可以直接使用。</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">mahout</font></font> [color=#002D7A !important]<font face="inherit">org</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">apache</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">lucene</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">benchmark</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">utils</font></font> [color=#333333 !important]<font face="inherit">.</font></font> <font face="inherit">ExtractReuters</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">sgm</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">out</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">如上所述,抽取好的结果在./reuters-out文件夹下面,每篇<REUTERS>文档,变成了一个独立的文件。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">一共有21578个txt,即数据集中含有21578篇文档:-)</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">说下命名规则吧,例如:文件名:./reuters-out/reut2-006.sgm-246.txt,表示来自于./reuters-sgm/reut2-006.sgm中的第246篇文档,下标从0开始。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">4、转换成SequenceFile</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">对于传统的文本聚类算法而言,下一步应该是:将文本转化为词的向量空间表示。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">然而,不要太着急哦。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">由于Mahout运行在Hadoop上,HDFS是为大文件设计的。如果我们把上述21578个txt都拷贝上去,这样是非常不合适的</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">设想下:假设对1000万篇新闻进行聚类,难道要拷贝1000w个文件么?这会把name node搞挂的。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">因此,Mahout采用SequenceFile作为其基本的数据交换格式。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">内置的seqdirectory命令(这个命令设计的不合理,应该叫directoryseq才对),可以完成 文本目录->SequenceFile的转换过程。</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">mahout </font></font>[color=#002D7A !important]<font face="inherit">seqdirectory</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">i</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">file</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#FF8000 !important]<font face="inherit">//$(pwd)/reuters-out/ -o file://$(pwd)/reuters-seq/ -c UTF-8 -chunk 64 -xm sequential</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">上述命令蕴含了2个大坑,在其他文档中均没有仔细说明:<br /> (1) -xm sequential,表示在本地执行,而不是用MapReduce执行。如果是后者,我们势必要将这些小文件上传到HDFS上,那样的话,还要SequenceFile做甚……<br /> (2) 然而seqdirectory在执行的时候,并不因为十本地模式,就在本地文件系统上寻找。而是根据-i -o的文件系统前缀来判断文件位置。也就是说,默认情况,依然十在HDFS上查找的……所以,这个file://的前缀是非常有必要的。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其他2个参数:</font></font></font><br /> <ul><br /> <li>-c UTF8:编码。<br /> <li>-chunk 64:64MB一个Chunk,应该和HDFS的BLOCK保持一致或者倍数关系。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">5、转换为向量表示</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">为了适应多种数据,聚类算法多使用向量空间作为输入数据。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">由于我们先前已经得到了处理好的SequenceFile,从这一步开始,就可以在Hadoop上进行啦。</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">hadoop</font></font> [color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">put</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">seq</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">开始text->Vector的转换:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">mahout </font></font>[color=#002D7A !important]<font face="inherit">seq2sparse</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">i</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">seq</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">o</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">ow</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">--</font></font> [color=#004ED0 !important]<font face="inherit">weight </font></font>[color=#002D7A !important]<font face="inherit">tfidf</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">--</font></font> <font face="inherit">maxDFPercent</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">85</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">--</font></font> [color=#002D7A !important]<font face="inherit">namedVector</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">输入和输出不解释了。在Mahout中的向量类型可以称为sparse。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">参数说明如下:</font></font></font><br /> <ul><br /> <li>-ow( 或 –overwrite):即使输出目录存在,依然覆盖。<br /> <li>–weight(或 -wt) tfidf:权重公式,大家都懂的。其他可选的有tf (当LDA时建议使用)。<br /> <li>–maxDFPercent(或 -x) 85:过滤高频词,当DF大于85%时,将不在作为词特征输出到向量中。<br /> <li>–namedVector (或-nv):向量会输出附加信息。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其他可能有用的选项:</font></font></font><br /> <ul><br /> <li>–analyzerName(或-a):指定其他分词器。<br /> <li>–minDF:最小DF阈值。<br /> <li>–minSupport:最小的支持度阈值,默认为2。<br /> <li>–maxNGramSize(或-ng):是否创建ngram,默认为1。建议一般设定到2就够了。<br /> <li>–minLLR(或 -ml):The minimum Log Likelihood Ratio。默认为1.0。当设定了-ng > 1后,建议设置为较大的值,只过滤有意义的N-Gram。<br /> <li>–logNormalize(或 -lnorm):是否对输出向量做Log变换。<br /> <li>–norm(或 -n):是否对输出向量做p-norm变换,默认不变换。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">看一下产出:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> 4<br /> 5<br /> 6<br /> 7<br /> 8<br /> 9<br /> </td><td>[color=#004ED0 !important]<font face="inherit">hadoop</font></font> [color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">ls</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">sparse</font></font> <br /> <font face="inherit">Found</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">7</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">items</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">df</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">count</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">dictionary</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">file</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">0</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">frequency</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">file</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">0</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">tf</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">vectors</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">tfidf</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">vectors</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">tokenized</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">documents</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">wordcount</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">说明各个文件的用途:</font></font></font><br /> <ul><br /> <li>dictionary.file-0:词文本 -> 词id(int)的映射。词转化为id,这是常见做法。<br /> <li>frequency.file:词id -> 文档集词频(cf)。<br /> <li>wordcount(目录): 词文本 -> 文档集词频(cf),这个应该是各种过滤处理之前的信息。<br /> <li>df-count(目录): 词id -> 文档频率(df)。<br /> <li>tf-vectors、tfidf-vectors (均为目录):词向量,每篇文档一行,格式为{词id:特征值},其中特征值为tf或tfidf。有用采用了内置类型VectorWritable,需要用命令”mahout vectordump -i <path>”查看。<br /> <li>tokenized-documents:分词后的文档。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">二、KMeans</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">1、运行K-Means</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">mahout </font></font>[color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">i</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">tfidf</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">vectors</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">c</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">o</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">k</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">20</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">dm </font></font>[color=#002D7A !important]<font face="inherit">org</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">apache</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">mahout</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">common</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">distance</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">CosineDistanceMeasure</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">x</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">200</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">ow</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">--</font></font> [color=#002D7A !important]<font face="inherit">clustering</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">参数说明如下:</font></font></font><br /> <ul><br /> <li>-i:输入为上面产出的tfidf向量。<br /> <li>-o:每一轮迭代的结果将输出在这里。<br /> <li>-k:几个簇。<br /> <li>-c:这是一个神奇的变量。若不设定k,则用这个目录里面的点,作为聚类中心点。否则,随机选择k个点,作为中心点。<br /> <li>-dm:距离公式,文本类型推荐用cosine距离。<br /> <li>-x :最大迭代次数。<br /> <li>–clustering:在mapreduce模式运行。<br /> <li>–convergenceDelta:迭代收敛阈值,默认0.5,对于Cosine来说略大。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">输出1,初始随机选择的中心点:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> </td><td>[color=#004ED0 !important]<font face="inherit">hadoop</font></font> [color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">ls</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">clusters</font></font> <br /> <font face="inherit">Found</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">1</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">items</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">part</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">randomSeed</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">输出2,聚类过程、结果:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> 4<br /> 5<br /> 6<br /> 7<br /> </td><td>[color=#004ED0 !important]<font face="inherit">hadoop </font></font>[color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">ls</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">kmeans</font></font> <br /> <font face="inherit">Found</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">5</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">items</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">_policy</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusteredPoints</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">0</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">1</font></font> <br /> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">2</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#800080 !important]<font face="inherit">final</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其中,clusters-k(-final)为每次迭代后,簇的20个中心点的信息。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">而clusterdPoints,存储了 簇id -> 文档id 的映射。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">2、查看簇结果</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">首先,用clusterdump,来查看k(20)个簇的信息。</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> 4<br /> 5<br /> </td><td>[color=#B85C00 !important]<font face="inherit"># Get to Local</font></font> <br /> [color=#004ED0 !important]<font face="inherit">hadoop</font></font> [color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">get</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> <br /> [color=#004ED0 !important]<font face="inherit">hadoop</font></font> [color=#002D7A !important]<font face="inherit">dfs</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">get</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> <br /> [color=#B85C00 !important]<font face="inherit"># View ..</font></font> <br /> [color=#004ED0 !important]<font face="inherit">mahout</font></font> [color=#002D7A !important]<font face="inherit">clusterdump</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">i</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">2</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#800080 !important]<font face="inherit">final</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">d</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">sparse</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">dictionary</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">file</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">0</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#004ED0 !important]<font face="inherit">dt</font></font> [color=#002D7A !important]<font face="inherit">sequencefile</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">o</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">cluster</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">dump</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> <font face="inherit">n</font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">20</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">要说明的是,clusterdump似乎只能在本地执行……所以先把数据下载到本地吧。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">参数说明:</font></font></font><br /> <ul><br /> <li>-i :我们只看最终迭代生成的簇结果。<br /> <li>-d :使用 词 -> 词id 映射,使得我们输出结果中,可以直接显示每个簇,权重最高的词文本,而不是词id。<br /> <li>-dt:上面映射类型,由于我们是seqdictionary生成的,so。。<br /> <li>-o:最终产出目录<br /> <li>-n:每个簇,只输出20个权重最高的词。<br /> </ul><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">看看dump结果吧:</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">一共有20行,表示20个簇。每行形如:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> 2<br /> 3<br /> 4<br /> 5<br /> 6<br /> 7<br /> 8<br /> 9<br /> 10<br /> 11<br /> 12<br /> 13<br /> 14<br /> 15<br /> 16<br /> 17<br /> 18<br /> 19<br /> 20<br /> 21<br /> 22<br /> 23<br /> </td><td>[color=#004ED0 !important]<font face="inherit">VL</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">12722</font></font> [color=#333333 !important]<font face="inherit">{</font></font> [color=#002D7A !important]<font face="inherit">n</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#CE0000 !important]<font face="inherit">1305</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">c</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#333333 !important]<font face="inherit">[</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> <font face="inherit">zorinsky</font>' [color=#002D7A !important]<font face="inherit">s</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">0.011</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">zurich</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">0.006...</font></font> [color=#333333 !important]<font face="inherit">]</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">r</font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#333333 !important]<font face="inherit">[</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">yuan</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">1.055</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">yugoslav</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">1.027</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">]</font></font> [color=#333333 !important]<font face="inherit">}</font></font> <br /> [color=#006FE0 !important]<font face="inherit">        </font></font> [color=#004ED0 !important]<font face="inherit">Top </font></font>[color=#002D7A !important]<font face="inherit">Terms</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">he</font></font> [color=#006FE0 !important]<font face="inherit">                                      </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">3.105303428364896</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">said</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">2.8756448350190205</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">would</font></font> [color=#006FE0 !important]<font face="inherit">                                   </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">2.6413800148214874</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">have</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">2.1552908992401942</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">government</font></font> [color=#006FE0 !important]<font face="inherit">                              </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.8426488105364687</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">which</font></font> [color=#006FE0 !important]<font face="inherit">                                   </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">1.749669294978467</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">economic</font></font> [color=#006FE0 !important]<font face="inherit">                                </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.7431561736768233</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">has</font></font> [color=#006FE0 !important]<font face="inherit">                                     </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.7429241635333532</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">prices</font></font> [color=#006FE0 !important]<font face="inherit">                                  </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.7182022383386604</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">oil</font></font> [color=#006FE0 !important]<font face="inherit">                                     </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">1.673632335845538</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">from</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">    </font></font> [color=#CE0000 !important]<font face="inherit">1.64287882106971</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">u</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#002D7A !important]<font face="inherit">s</font></font> [color=#006FE0 !important]<font face="inherit">                                     </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.6223870217115028</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">had</font></font> [color=#006FE0 !important]<font face="inherit">                                     </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">1.602064758607711</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">more</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.5874425666999086</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">last</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">   </font></font>[color=#CE0000 !important]<font face="inherit">1.561653600890061</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">we</font></font> [color=#006FE0 !important]<font face="inherit">                                      </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.5274837373316974</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">been</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.4653439554674872</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">year</font></font> [color=#006FE0 !important]<font face="inherit">                                    </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.4279387724353894</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">could</font></font> [color=#006FE0 !important]<font face="inherit">                                   </font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.4152588548331426</font></font> <br /> [color=#006FE0 !important]<font face="inherit">                </font></font> [color=#002D7A !important]<font face="inherit">minister</font></font> [color=#006FE0 !important]<font face="inherit">                                </font></font> [color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit">></font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#CE0000 !important]<font face="inherit">1.4146991936183066</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其中前面的12722是簇的ID,n=1305即簇中有这么多个文档。c向量是簇中心点向量,格式为 词文本:权重(点坐标),r是簇的半径向量,格式为 词文本:半径。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">下面的Top Terms是簇中选取出来的特征词。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">3、查看聚类结果</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其实,聚类结果中,更重要的是,文档被聚到了哪个类。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">遗憾的是,在很多资料中,都没有说明这一点。前文我们已经提到了,簇id -> 文档id的结果,保存在了clusteredPoints下面。这也是mahout内置类型存储的。我们可以用seqdumper命令查看。</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#004ED0 !important]<font face="inherit">mahout</font></font> [color=#002D7A !important]<font face="inherit">seqdumper</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">i</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">user</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">coder4</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reuters</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#002D7A !important]<font face="inherit">kmeans</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">clusteredPoints</font></font> [color=#006FE0 !important]<font face="inherit">/</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其中,-d和-dt的原因同clusterdump。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">如果不指定-o,默认输出到屏幕,输出结果为形如:</font></font></font><br /> [color=#333333 !important]<font face="inherit"></font></font><br /> <table><tr><td>1<br /> </td><td>[color=#002D7A !important]<font face="inherit">Key</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">4255</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">Value</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">wt</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">1.0</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#002D7A !important]<font face="inherit">distance</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">0.7752480913348985</font></font> [color=#006FE0 !important]<font face="inherit">  </font></font> [color=#002D7A !important]<font face="inherit">vec</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">/</font></font> [color=#002D7A !important]<font face="inherit">reut2</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">000.sgm</font></font> [color=#006FE0 !important]<font face="inherit">-</font></font> [color=#CE0000 !important]<font face="inherit">0.txt</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#006FE0 !important]<font face="inherit">=</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">[</font></font> [color=#CE0000 !important]<font face="inherit">14</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">4.670</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">35</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">7.545</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#333333 !important]<font face="inherit">.</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">11278</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">6.394</font></font> [color=#333333 !important]<font face="inherit">,</font></font> [color=#006FE0 !important]<font face="inherit"></font></font>[color=#CE0000 !important]<font face="inherit">11288</font></font> [color=#006FE0 !important]<font face="inherit">:</font></font> [color=#CE0000 !important]<font face="inherit">6.731</font></font> [color=#333333 !important]<font face="inherit">]</font></font> <br /> </td></tr></table><font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">其实,这个输出是一个SequenceFile,大家自己写程序也可以读出来的。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">Key是ClusterID,上面clusterdump的时候,已经说了。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">Value是文档的聚类结果:wt是文档属于簇的概率,对于kmeans总是1.0,/reut2-000.sgm-0.txt就是文档标志啦,前面seqdirectionary的-nv起作用了,再后面的就是这个点的各个词id和权重了。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">三、Fuzzy-KMeans</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">KMeans是一种简单有效的聚类方法,但存在一些缺点。</font></font></font><br /> <font color="#444444"><font face="Helvetica, Arial, sans-serif"><font style="font-size:14px">例如:一个点只能属于一个簇,这种叫做硬聚类。而很多情况下,软聚类才是科学的。例如:《哈利波》属于小说,也属于电影。Fuzzy-Kmeans 通过引入“隶属度”的方式,实现了软聚类。</font></font></font></div> <!-- 选为最佳答案 --> <div class="best_aws"> </div> <div class="reply_f"> <span class="show-replay" data-authorid="2255630" data-pid="8561144"><img src="https://staticbbs.elecfans.com/template/xinrui_iuni_mobile/touch/images/img/d_comment.png"/><i></i></span> <span class="com_dianzan" data-tid="2168577" data-pid="8561144"> <img src="/template/xinrui_iuni_mobile/touch/images/img/d_zan.png"/> <i> </i> </span> <span class="report-btn" data-fid="999" data-pid="8561144"><i>举报</i></span> </div> <!-- 回复 box --> <div class="self_reply reply-box reply_list"> </div> </div> </div> </div> <!-- main postlist end --> <div class="tle_w_More"> <p>更多回帖<img src="https://staticbbs.elecfans.com/template/xinrui_iuni_mobile/touch/images/img/d_more.png"/></p> </div><div id="post_new"></div> <div class="replyCom cfix"> <form method="post" autocomplete="off" id="fastpostform" action="forum.php?mod=post&action=reply&fid=999&tid=2168577&extra=page%3D1&replysubmit=yes"> <input type="hidden" name="formhash" value="e40e47aa" /> <ul class="fastpost"> <li><input type="text" value="我也说一句" class="input" color="gray" name="message" id="fastpostmessage"></li> <li id="fastpostsubmitline" style="display:none;"> </li>rotate(-90deg); <li> <input type="button" value="回复" class="btn1" name="replysubmit" id="fastpostsubmit"> <a href="https://bbs.elecfans.com/forum.php?mod=post&action=reply&fid=999&tid=2168577&reppost=8561044&page=1" class="y" style="height:30px;width:30px;margin-top:7px;background:url(https://staticbbs.elecfans.com/static/image/mobile/images/icon_photo.png) no-repeat"> <span class="none">回复</span> </a> </li> </ul> </form> </div> <!-- <div class="btRepPost reply_tiezhi"> <a href="https://bbs.elecfans.com/forum.php?mod=post&action=reply&fid=999&tid=2168577&reppost=8561044&page=1"> <span class="inp z">我也要说两句</span> <span class="btn1 y">回贴</span> </a> </div> --> <!-- 新改的帖子 --> <div class="new_footer new_f2"> <input type="text" placeholder="发评论" class="pinlunWrap"> <!-- <img src="/static/image/sofa.png" class="sofa"/> --> <span class="pinlun"> <i style="">1</i> </span> <span class="dianzan dianzanPage "> <i style="display:none">0</i> </span> <span class="guanzhu guanzhuPage "> <i style="display:none">0</i> </span> <span class="share btn-createCover"> </span> </div> <!--<form id="ask_form" class="fixed_input clearfix" method="post" action="/m/forum.php?mod=post&action=reply&fid=999&tid=2168577&extra=Array&replysubmit=yes"> <a href="javascript:;"><input type="file" name="Filedata" id="filedata"/></a> <input type="text" name="message" placeholder="我也要说两句"> <input type="hidden" name="formhash" value="e40e47aa"> <input type="hidden" name="posttime" id="posttime" value="1732542151" /> <input type="hidden" name="noticeauthor" value="" /> <input type="hidden" name="noticetrimstr" value="" /> <input type="hidden" name="noticeauthormsg" value="" /> <input type="hidden" name="reppid" value=""> <input type="hidden" value="replysubmit"> <button id="postsubmit" type="submit">回复</button> </form>--> <div class="btRepPost reply_form new_footer huifuT" style="display: none;"> <div class="huifuShadow"></div> <div class="pIndex"> <form class="cl reply-form" method="post" action="/m/forum.php?mod=post&action=reply&comment=yes&extra=&commentsubmit=yes&infloat=yes&ajaxdata=json"> <input class="lf reply-ipt" name="message" type="text" placeholder="输入回复评论内容……"> <input type="hidden" name="handlekey" value="comment"> <input type="hidden" name="formhash" value="e40e47aa" /> <input type="hidden" name="topid" value="0"> <input type="hidden" name="fcid" value="0"> <input type="hidden" name="tid" value="2168577"> <input type="hidden" name="pid" value="0"> <input type="hidden" name="page" value="1"> </form> <div class="huifuMenu"> <span class="pinlun"> <i style="">1</i> </span> <span class="dianzan dianzanHF "> <i style="display:none">0</i> </span> <span class="guanzhu guanzhuHF "> <i style="display:none">0</i> </span> <span class="share btn-createCover"> </span> <button type="submit" class="lr reply-sub" id="hftj">发送</button> </div> </div> </div> <script src="https://staticbbs.elecfans.com/static/js/mobile/ajaxfileupload.js?rJf" type="text/javascript"></script> <script src="https://staticbbs.elecfans.com/static/js/mobile/buildfileupload.js?rJf" type="text/javascript"></script> <script type="text/javascript"> (function() { var form = $('#fastpostform'); $('#fastpostmessage').on('focus', function() { popup.open('您还未登录,立即登录?', 'confirm', 'member.php?mod=logging&action=login'); this.blur(); }); $('#fastpostsubmit').on('click', function() { var msgobj = $('#fastpostmessage'); if(msgobj.val() == '我也说一句') { msgobj.attr('value', ''); } $.ajax({ type:'POST', url:form.attr('action') + '&handlekey=fastpost&loc=1&inajax=1', data:form.serialize(), dataType:'xml' }) .success(function(s) { evalscript(s.lastChild.firstChild.nodeValue); }) .error(function() { window.location.href = obj.attr('href'); popup.close(); }); return false; }); $('#replyid').on('click', function() { $(document).scrollTop($(document).height()); $('#fastpostmessage')[0].focus(); }); })(); function succeedhandle_fastpost(locationhref, message, param) { var pid = param['pid']; var tid = param['tid']; if(pid) { $.ajax({ type:'POST', url:'forum.php?mod=viewthread&tid=' + tid + '&viewpid=' + pid + '', dataType:'xml' }) .success(function(s) { $('#post_new').append(s.lastChild.firstChild.nodeValue); }) .error(function() { window.location.href = 'forum.php?mod=viewthread&tid=' + tid; popup.close(); }); } else { if(!message) { message = '本版回帖需要审核,您的帖子将在通过审核后显示'; } popup.open(message, 'alert'); } $('#fastpostmessage').attr('value', ''); if(param['sechash']) { $('.seccodeimg').click(); } } function errorhandle_fastpost(message, param) { popup.open(message, 'alert'); } // var imgexts = typeof imgexts == 'undefined' ? 'jpg, jpeg, gif, png' : imgexts; // var STATUSMSG = { // '-1' : '内部服务器错误', // '0' : '上传成功', // '1' : '不支持此类扩展名', // '2' : '服务器限制无法上传那么大的附件', // '3' : '用户组限制无法上传那么大的附件', // '4' : '不支持此类扩展名', // '5' : '文件类型限制无法上传那么大的附件', // '6' : '今日您已无法上传更多的附件', // '7' : '请选择图片文件(' + imgexts + ')', // '8' : '附件文件无法保存', // '9' : '没有合法的文件被上传', // '10' : '非法操作', // '11' : '今日您已无法上传那么大的附件' // }; // $(document).on('change', '#filedata', function() { // popup.open('<img src="' + IMGDIR + '/imageloading.gif">'); // // uploadsuccess = function(data) { // if(data == '') { // popup.open('上传失败,请稍后再试', 'alert'); // } // var dataarr = data.split('|'); // if(dataarr[0] == 'DISCUZUPLOAD' && dataarr[2] == 0) { // popup.close(); // $('#imglist').append('<li><span aid="'+dataarr[3]+'" class="del"><a href="javascript:;"><img src="https://staticbbs.elecfans.com/static/image/mobile/images/icon_del.png"></a></span><span class="p_img"><a href="javascript:;"><img style="height:54px;width:54px;" id="aimg_'+dataarr[3]+'" title="'+dataarr[6]+'" src="'+dataarr[5]+'" /></a></span><input type="hidden" name="attachnew['+dataarr[3]+'][description]" /></li>'); // } else { // var sizelimit = ''; // if(dataarr[7] == 'ban') { // sizelimit = '(附件类型被禁止)'; // } else if(dataarr[7] == 'perday') { // sizelimit = '(不能超过'+Math.ceil(dataarr[8]/1024)+'K)'; // } else if(dataarr[7] > 0) { // sizelimit = '(不能超过'+Math.ceil(dataarr[7]/1024)+'K)'; // } // popup.open(STATUSMSG[dataarr[2]] + sizelimit, 'alert'); // } // }; // // if(typeof FileReader != 'undefined' && this.files[0]) {//note 支持html5上传新特性 // // $.buildfileupload({ // uploadurl:'misc.php?mod=swfupload&operation=upload&type=image&inajax=yes&infloat=yes&simple=2', // files:this.files, // uploadformdata:{uid:"0", hash:"af80b29545ec8aebd8555a7f210534fb"}, // uploadinputname:'Filedata', // maxfilesize:"51200", // success:uploadsuccess, // error:function() { // popup.open('上传失败,请稍后再试', 'alert'); // } // }); // // } else { // // $.ajaxfileupload({ // url:'misc.php?mod=swfupload&operation=upload&type=image&inajax=yes&infloat=yes&simple=2', // data:{uid:"0", hash:"af80b29545ec8aebd8555a7f210534fb"}, // dataType:'text', // fileElementId:'filedata', // success:uploadsuccess, // error: function() { // popup.open('上传失败,请稍后再试', 'alert'); // } // }); // // } // }); // 回复的区域的 var askForm = $("#ask_form"); /*获取表单原始信息*/ var askType =0;/*判断是评论还是回复*/ var fcid = 0; var topid = 0; var conmment_index = 0; var originFont = '请输入评论...'; var originAct = askForm.attr('action'); var Uid = $("#Uid").val(); /*点击回复输入框显示回复人*/ $("body").on("click",".reply-cate",function(e){ if( Uid != 0 ){ $('.reply_form').show(); $('.reply-form').find('.reply-ipt').focus(); $(".reply_tiezhi").hide(); e.stopPropagation(); askType =1; var t_pid = $(this).attr('data-pid'); $("[name=pid]").val(t_pid); $(".show-replay").each(function(index){ if($(this).attr('data-pid') == t_pid ){ conmment_index = index } }) var data_id = $(this).attr('data-id'); if( data_id && data_id.indexOf("+") == -1 ){ fcid = data_id; topid = data_id; }else{ fcid = data_id.split("+")[0]; topid = data_id.split("+")[1]; } var getName = $(this).attr('data-name'); $('.reply-form').find('.reply-ipt').attr('placeholder','回复:'+getName).focus(); }else{ showWindow('reply','member.php?mod=logging&action=login&inajax=yes&guestmessage=yes'); } }); $("#page").click(function(){ askType =0; fcid = 0 ; topid = 0; }); /* * 记录原始输入框的提交地址 * */ var replyAct = []; $("#answer-list>li").each(function(){ replyAct[$(this).index()] = $(this).find('.reply-form').attr('action'); }); $(".reply-box form,#ask_form,.reply-right,.btRepPost").click(function(e){ e.stopPropagation(); }); $(".self_reply").each(function(){ var that = $(this); var li = $('li',that); if (li.length > 5) { li.filter(':gt(4)').hide(); that.find("ul").append('<li class="c_w_more"><a>查看更多回复<img src="/template/xinrui_iuni_mobile/touch/images/img/d_m_blue.png"/></a><li>'); } }); $('body').on('click','.c_w_more',function(){ $(this).parent().find("li").each(function(index){ $(this).show(); }); $(this).hide(); }); $(".reply-form").unbind('submit').bind('submit', function(event){ var getForm=$(this); var getBtn=getForm.find("[type=submit]"); //判断是否可点或者登录 if(getBtn.hasClass("disabled") || Uid == '' ){ return false; } $("[name=topid]").val(topid); $("[name=fcid]").val(fcid); var getData=getForm.serializeArray(); //验证表单 var s=null; for(var i in getData){ var getOne=getData[i]; var getName=getOne["name"]; var getValue=getOne["value"]; switch (getName){ case "subject": if(getValue.length>160){ s="您的标题超过160字符限制" } break; case "message": if(getValue==""){ s="抱歉,请输入内容"; } else if(getValue.length<5){ s="您的内容不能少于5个字符" } else if(getValue.length>5000){ s="您的内容不能超过5000字符限制" } break; default:break; } if(!!s){ popup.open( s , 'alert'); return false; } } var getUrl=getForm.attr("action"); var getBtnTxt=getBtn.html(); $.ajax({ url:getUrl, type:"post", data:getData, dataType:"json", beforeSend:function(){ getBtn.attr("disabled","disabled").html("请稍等..."); }, success:function(res){ var getData=res["data"]; var getMsg=res["message"] || res["msg"]; getForm.parent().find("ul").css({ "height" : "auto", "overflow": "auto" }); //问答 ; //提交问题成功 if(!!getData["pid"]){ var author = getData.from_username; var message = getData.comment; var setHtml = ''; if(askType==0){ setHtml ='<li class="clearfix">'+ '<div class="reply-left reply-cate">'+ '<a href="https://bbs.elecfans.com/home.php?mod=space&uid='+getData.from_uid+'">'+getData.from_username+': </a>'+ '<span class="reply-cate" data-pid="'+getData.pid+'" data-id="'+getData.pcid+'" data-name="'+getData.from_username+'">'+message+'</span>'+ '</div>'+ '</li>'; }else{ setHtml ='<li class="clearfix">'+ '<div class="reply-left">'+ '<a href="https://bbs.elecfans.com/home.php?mod=space&uid='+getData.from_uid+'" target="_blank">'+getData.from_username+'</a>'+ '<span> 回复 </span>'+ '<a href="https://bbs.elecfans.com/home.php?mod=space&uid='+getData.to_uid+'" target="_blank">'+getData.to_username+' :</a>'+ '<span class="reply-cate" data-pid="'+getData.pid+'" data-id="'+ getData.fcid +'+'+ getData.pcid +'" data-name="'+getData.from_username+'"> '+getData.comment+'</span>'+ '</div>'+ '</li>'; } $(".reply_f").eq(conmment_index).find('i').eq(0).text( $(".reply_f").eq(conmment_index).find('i').eq(0).text()/1 + 1); $(".self_reply").eq(conmment_index).find('li').each(function(){ $(this).show(); if($(this).hasClass("c_w_more")){ $(this).hide(); } }) if($(".self_reply").eq(conmment_index).find('ul').length > 0){ $(".self_reply").eq(conmment_index).find('ul').append(setHtml); }else{ $(".self_reply").eq(conmment_index).append('<ul>'+setHtml+'</ul>'); } getForm.find('[name=message]').val(""); getForm.find('button').removeClass('willSend'); $('.reply_form').hide(); $(".reply_tiezhi").show(); } //提交失败 else if(!!getMsg){ popup.open(getMsg, 'alert'); } else{ popup.open("出错了!", 'alert'); } }, complete:function(){ getBtn.removeAttr("disabled").html(getBtnTxt); } }); return false; }); /*提交按钮添加样式*/ $("#ask_form .lf,.reply-form .reply-ipt").keyup(function(){ if($(this).val()!=""){ $(this).siblings('button').addClass('willSend') }else{ $(this).siblings('button').removeClass('willSend') } }); /* 点击回复显示评论框 */ $(".show-replay").each(function(index){ var _this = this; $(_this).unbind("click").click(function(e){ if( Uid != 0 ){ // 关闭入口-跳转通告页面 var post_close_state = "0" || 1, G_fid = "999" || 0, open_fid_val = "1700,1702,1685,1703,640" || "", open_fid_arr = open_fid_val.split(","); if (post_close_state == 0 || open_fid_arr.indexOf(G_fid) != -1) { // console.log('可以评论') } e.stopPropagation(); isVerification($, "", "", function(){ $("[name=pid]").val($(_this).attr('data-pid')); askType =0; fcid = 0 ; topid = 0; conmment_index = index; window.localStorage.setItem("bbs_pid_num",$(_this).attr('data-authorid')) $('.reply_form').show(); $(".reply_tiezhi").hide(); $('.reply-form').find('.reply-ipt').focus(); $('.reply-form').find('.reply-ipt').attr('placeholder',originFont); }) } //在app中没有登录调取app登录 if(Uid == 0 && IS_IN_APP){ callAppFunction("h5CookieLogin") e.stopPropagation(); } }); }); //抢沙发图表展示 if($(".postListItem").length >1){ $(".sofa").hide() } //点击顶部评论 $(".new_footer").on('click','.pinlun',function(e){ if(Uid == 0 && IS_IN_APP){ console.log('isClicked',isClicked) if(isClicked){ isClicked = false; return false } isClicked = true; callAppFunction("h5CookieLogin") e.stopPropagation(); return false } var plNumber = $(this).find('i').html(); if(plNumber == 0){ window.location.href="forum.php?mod=post"+"&"+"action=reply"+"&"+"fid=999"+"&"+"tid=2168577"+"&"+"reppost=8561044"+"&"+"page=1" }else{ window.location.href = "#pinL" } }) $(".pinlunWrap").unbind("focus").focus(function(e){ if(Uid == 0 && IS_IN_APP){ callAppFunction("h5CookieLogin") e.stopPropagation(); return false } isVerification($, "", "", function(){ window.location.href="forum.php?mod=post"+"&"+"action=reply"+"&"+"fid=999"+"&"+"tid=2168577"+"&"+"reppost=8561044"+"&"+"page=1" }) }); //提交回复评论 $("#hftj").unbind("click").click(function(){ isVerification($, "", "", function(){ $(".reply-form").submit(); }) return false; }) //点击用户图像 $(document).on('click','.author_avatar',function(){ if(Uid !== 0){ window.location.href ="https://bbs.elecfans.com/m/home.php?mod=space"+"&"+"uid="+ Uid+"&"+"do=profile"+"&"+"mycenter=1" }else{ window.location.href = window.location.host } }) $(function(){ $('.huifuShadow').click(function(){ $(".huifuT").hide() }) }) </script> <!-- <div style="text-align: center;background: #fff;margin-top: 10px; padding: 20px 0;"> <a href="https://t.elecfans.com/active/10years.html"> <img src="/template/xinrui_iuni_mobile/images/10years.jpg"> </a> </div> --> <div style="text-align: center;margin-top: 20px;padding: 0 12px;" class="advertWrap hideInApp"> <a href="" target="_blank"> <img src="" alt=""> </a> </div> <div class="simple-about simple-about2 simple-about3"> <div class="simple-about-tit"> <h4>相关问答</h4> <a class="exTitleTag inAppLogin" href="https://m.elecfans.com/tags/Mahout.html/" title="Mahout">Mahout</a> <a class="exTitleTag inAppLogin" href="https://m.elecfans.com/tags/工具.html/" title="工具">工具</a> <input type="hidden" name="tagids" value="112417,20553"> </div> <ul><li class="cl"> <a class="inAppLogin" href="https://bbs.elecfans.com/jishu_2097471_1_1.html"> <h4>K均值<span style='color: red'>聚</span><span style='color: red'>类</span>算法的MATLAB怎么实现?</h4> <div class="articleInfo"> <span class="article_time">2021-06-10</span> <span class="article_hot">1111</span> <!-- <span class="article_hot">1111</span>--> </div> </a> </li> <li class="cl"> <a class="inAppLogin" href="https://bbs.elecfans.com/jishu_1584199_1_1.html"> <h4>FCM<span style='color: red'>聚</span><span style='color: red'>类</span>算法以及改进模糊<span style='color: red'>聚</span><span style='color: red'>类</span>算法用于医学图像分割的matlab源程序</h4> <div class="articleInfo"> <span class="article_time">2018-05-11</span> <span class="article_hot">3470</span> <!-- <span class="article_hot">3470</span>--> </div> </a> </li> <li class="cl"> <a class="inAppLogin" href="https://bbs.elecfans.com/jishu_1905383_1_1.html"> <h4>请教51用的<span style='color: red'>聚</span><span style='color: red'>类</span>算法</h4> <div class="articleInfo"> <span class="article_time">2020-03-09</span> <span class="article_hot">1747</span> <!-- <span class="article_hot">1747</span>--> </div> </a> </li> <li class="cl"> <a class="inAppLogin" href="https://bbs.elecfans.com/jishu_2260249_1_1.html"> <h4>如何去实现模块设计<span style='color: red'>中</span>的高内<span style='color: red'>聚</span>低耦合呢</h4> <div class="articleInfo"> <span class="article_time">2022-02-25</span> <span class="article_hot">1298</span> <!-- <span class="article_hot">1298</span>--> </div> </a> </li> <li class="cl"> <a class="inAppLogin" href="https://bbs.elecfans.com/jishu_399271_1_1.html"> <h4>帮助<span style='color: red'>中</span>说是选择输出的<span style='color: red'>类</span> 请问这个<span style='color: red'>类</span>是什么意思?</h4> <div class="articleInfo"> <span class="article_time">2013-10-30</span> <span class="article_hot">1887</span> <!-- <span class="article_hot">1887</span>--> </div> </a> </li> <li class="cl"> <a class="inAppLogin" href="https://bbs.elecfans.com/jishu_262234_1_1.html"> <h4>求信号处理高级<span style='color: red'>工具</span>包的<span style='color: red'>用法</span>资料</h4> <div class="articleInfo"> <span class="article_time">2012-08-14</span> <span class="article_hot">2663</span> <!-- <span class="article_hot">2663</span>--> </div> </a> </li> <li class="cl"> <a class="inAppLogin" href="https://bbs.elecfans.com/jishu_2042910_1_1.html"> <h4>CBB22电容与MPK电容的差别是什么</h4> <div class="articleInfo"> <span class="article_time">2021-03-10</span> <span class="article_hot">3235</span> <!-- <span class="article_hot">3235</span>--> </div> </a> </li> <li class="cl"> <a class="inAppLogin" href="https://bbs.elecfans.com/jishu_224556_1_1.html"> <h4>为什么我一运行,总显示我没有定义啊</h4> <div class="articleInfo"> <span class="article_time">2012-04-13</span> <span class="article_hot">6680</span> <!-- <span class="article_hot">6680</span>--> </div> </a> </li> <li class="cl"> <a class="inAppLogin" href="https://bbs.elecfans.com/jishu_300042_1_1.html"> <h4>有关路径相似度matlab实现的问题,真心求助,万分感谢</h4> <div class="articleInfo"> <span class="article_time">2012-12-16</span> <span class="article_hot">3132</span> <!-- <span class="article_hot">3132</span>--> </div> </a> </li> <li class="cl"> <a class="inAppLogin" href="https://bbs.elecfans.com/jishu_221873_1_1.html"> <h4>模糊C-均值<span style='color: red'>聚</span><span style='color: red'>类</span>,请教大虾帮我解释下面的程序</h4> <div class="articleInfo"> <span class="article_time">2012-03-24</span> <span class="article_hot">1833</span> <!-- <span class="article_hot">1833</span>--> </div> </a> </li> </ul> </div> <!-- <div style="padding-bottom: 20px;width: 96%;margin:20px auto;overflow: hidden;"> <a href="http://z.elecfans.com/272.html?elecfans_trackid=bbs_marticle" target="_blank"><img style="border:1px solid #e4e4e4;" src="/template/xinrui_iuni_mobile/touch/images/img/201901ad.jpg?rJf" width = "100%" /></a> </div> --> <!-- <iframe id="iframe_tags" data-elecfans_trackid="bbs__detail_m" data-tag="Mahout,工具" src="/static/iframe/course_m.html" width="100%" height="250" style="border: none;margin: 20px 0 0 0;"></iframe> --> <form method="post" autocomplete="off" name="modactions" id="modactions"> <input type="hidden" name="formhash" value="e40e47aa" /> <input type="hidden" name="optgroup" /> <input type="hidden" name="operation" /> <input type="hidden" name="listextra" value="" /> </form> <script type="text/javascript"> $('.favbtn').on('click', function() { var obj = $(this); $.ajax({ type:'POST', url:obj.attr('href') + '&handlekey=favbtn&inajax=1', data:{'favoritesubmit':'true', 'formhash':'e40e47aa'}, dataType:'xml', }) .success(function(s) { popup.open(s.lastChild.firstChild.nodeValue); evalscript(s.lastChild.firstChild.nodeValue); }) .error(function() { window.location.href = obj.attr('href'); popup.close(); }); return false; }); </script> <a href="javascript:;" title="返回顶部" class="scrolltop bottom"></a> </div> <!-- 分享注释 --> <!-- <div id="viewShare" class="viewShare"> <div class="bdsharebuttonbox"> <ul class="cfix"> <li><a href="#" class="bds_tsina" data-cmd="tsina" title="分享到新浪微博"></a></li> <li><a href="#" class="bds_weixin" data-cmd="weixin" title="分享到微信"></a></li> <li><a href="#" class="bds_sqq" data-cmd="sqq" title="分享到QQ好友"></a></li> <li><a href="#" class="bds_qzone" data-cmd="qzone" title="分享到QQ空间"></a></li> </ul> </div> <script>window._bd_share_config={"common":{"bdSnsKey":{},"bdText":"","bdMini":"2","bdMiniList":false,"bdPic":"","bdStyle":"0","bdSize":"32"},"share":{},"image":{"viewList":["tsina","weixin","sqq","qzone"],"viewText":"分享到:","viewSize":"16"},"selectShare":{"bdContainerClass":null,"bdSelectMiniList":["tsina","weixin","sqq","qzone"]}};with(document)0[(getElementsByTagName('head')[0]||body).appendChild(createElement('script')).src='https://skin.elecfans.com/bdshare/static/api/js/share.js?v=89860593.js?cdnversion='+~(-new Date()/36e5)];</script> </div> --> <!-- 发帖按钮 --> <div class="g_postBtn g_postBtn_bottom hideInApp"> <a href="/m/forum.php?mod=post&action=newthread&fid=999&special=3" class="g_post-a"> <!-- <img src="/template/xinrui_iuni_mobile/touch/group/images/post_icon.png"/> --> 发帖 </a> </div> <!-- 广告位链接 --> <div class="aside_adv aside_adv1" data-href="/member.php?mod=logging&action=login&datahref=jishu_2178202_1_1" > <div class="close_adv close_adv1"></div> </div> <div class="aside_adv aside_adv2" data-href="/member.php?mod=logging&action=login&datahref=try" > <div class="close_adv close_adv2"></div> </div> <div class="aside_adv aside_adv3" data-href="/member.php?mod=logging&action=login&datahref=jishu_2176945_1_1" > <div class="close_adv close_adv3"></div> </div> <div class="aside_adv aside_adv4" data-href="/member.php?mod=logging&action=login&datahref=engineer" > <div class="close_adv close_adv4"></div> </div> <div class="aside_adv aside_adv5" > </div> <!-- 广告位链接 --> <!-- 登录/注册 --> <div class="login-reg-fixed hideInApp"> <a href="/member.php?mod=logging&action=login" class="login-reg-btn"> 登录/注册 </a> </div> <!-- 打开APP --> <script src="https://staticbbs.elecfans.com/static/qrious.min.js?rJf" type="text/javascript"></script> <script src="https://staticbbs.elecfans.com/static/createCover.js?rJf" type="text/javascript"></script> <script src="https://staticbbs.elecfans.com/static/js/mobile/common/index.js?rJf" type="text/javascript"></script> <script type="text/javascript"> $(function() { /*--------------------------随机显示广告位----------------------*/ //当天没有关闭 if(apply_newDate_current_time() - (window.localStorage.getItem("adv_newDate_12") || 0) >=1 ){ getadShow() } $(".aside_adv").remove() var ua = navigator.userAgent; ua = ua.toLocaleLowerCase(); // 根据Uid判断是否登录 var isLogin = $("#Uid").val() || 0; if(!isLogin) { var attachNode = document.querySelectorAll('a') Array.prototype.forEach.call(attachNode, function (link) { // 不显示a链接 link.setAttribute("href","javascript:;") link.addEventListener('click', function (e) { if(IS_IN_APP){ return false } location.href = '/m/member.php?mod=logging&action=login' }) }) } function getadShow(){ //var randomIndex=randomNum(1,4); //四个广告随机显示隐藏 $(".aside_adv").hide(); $(".aside_adv5").show(); } //window.localStorage.setItem("apply__user_newDate",apply_newDate_current()) //点击关闭当天不在开启 $(".aside_adv").on("click",".close_adv",function(){ //第五个广告显示 if($(this).parents(".aside_adv").hasClass("aside_adv5") ){ window.localStorage.setItem("adv_newDate_12",apply_newDate_current_time()) }else{ window.localStorage.setItem("adv_newDate",apply_newDate_current_time()) } $(".aside_adv").remove() }) //点击跳转链接 $("body").on("click",".aside_adv",function(){ if($(this).hasClass("aside_adv5")){ //window.open("https://dfm.elecfans.com/viewer/?from=pdfdb") }else{ var data_href=$(this).attr("data-href") var res=$(this).attr("res") if(location.host.indexOf(".net")>-1){ window.open("https://bbs.elecfans.net"+data_href ) }else{ window.open( "https://bbs.elecfans.com"+data_href ) } } }) /*生成随机数*/ function randomNum(minNum,maxNum){ switch(arguments.length){ case 1: return parseInt(Math.random()*minNum+1,10); break; case 2: return parseInt(Math.random()*(maxNum-minNum+1)+minNum,10); break; default: return 0; break; } } //获取年月 function apply_newDate_current(){ var date=new Date(); //获取年份 var year=date.getFullYear(); var month=date.getMonth()+1; var day=date.getDate(); return year+"-"+month+"-"+day; } //获取时间 function apply_newDate_current_time(){ var date=new Date(); //获取年份 var year=date.getFullYear(); var month=date.getMonth()+1; var day=date.getDate(); return year+""+(month>9?month:("0"+month))+""+(day>9?day:("0"+day)); } /*--------------------------随机显示广告位----------------------*/ //判断是否显示全局登录按钮 var isLoginShowTimes = 0; //60秒就显示 var inter = null; var uid_v = parseInt('0'); if(uid_v==0){ var ua = navigator.userAgent; ua = ua.toLocaleLowerCase(); if(ua.indexOf("appandroid")==-1 && ua.indexOf("appios")==-1){ //不是在app中控制显示影藏 inter = setInterval(function(){ isLoginShowTimes++; if(isLoginShowTimes == 10){ $(".login-reg-fixed").show(); isLoginShowTimes = 0; } },100) } } $(document).on('scroll',function(ev){ //是否显示全局登录 isLoginShowTimes = 0; $(".login-reg-fixed").hide(); }) //消息推送初始化 if($(".bbs_message_tip").length!==0 &&is_weixn() && 2253333 == 0 ){ message_init(1) $(".bbs_message_tip").show() } //是微信环境下显示 否则不显示 // if( is_weixn()){ // $(".bbs_message_tip").show() // } $(".bbs_message_tip").on("click",function(){ message_init() }) function is_weixn(){ var ua = navigator.userAgent.toLowerCase(); if(ua.match(/MicroMessenger/i)=="micromessenger") { return true; } else { return false; } } //消息推送接口 function message_init(type){ $.ajax({ type: 'get', url: "/app/api/index.php?s=Home/Message/start", data:{tid:2168577,type:type}, dataType: "json", success: function (res) { // 状态返回1 页面显示关闭 // 状态返回2 页面显示开启 if(res.data.state == 1){ $(".bbs_message_tip_text").html("关闭该帖子的消息推送") $(".bbs_message_tip img").attr("src","/template/xinrui_iuni_mobile/images/msg_close.png") }else{ $(".bbs_message_tip_text").html("开启该帖子的消息推送") $(".bbs_message_tip img").attr("src","/template/xinrui_iuni_mobile/images/msg_open.png") } if(type==1){ }else{ // if(res.data.state == 1){ // $(".bbs_message_tip_text").html("开启该帖子的消息推送") // $(".bbs_message_tip img").attr("src","/template/xinrui_iuni_mobile/images/msg_open.png") // }else{ // $(".bbs_message_tip_text").html("关闭该帖子的消息推送") // $(".bbs_message_tip img").attr("src","/template/xinrui_iuni_mobile/images/msg_close.png") // } layer.msg(res.msg) } } }) } // $('#viewShare').mmenu({ // autoHeight : true, // navbar : { // title : false // }, // offCanvas : { // position : "bottom", // zposition : "front", // modal : true // } // }); //二维码图片 var qrHref =window.location.href.split("#")[0]; var qr = new QRious({ element: document.getElementById('qrious'), size: 250, value: qrHref, backgroundAlpha: 0.8, foregroundAlpha: 0.8, level: 'H', padding: 25 }); //分享海报 var title = $('.overSpot').eq(0).text(); var columnTypeName = $('.forumListHeader h1').text(); var scriptDes = $(".postListItem").find('script').text(); var styleDes = $(".postListItem").find('style').text(); var coverDesc =$(".postListItem").eq(0).text() //将作者图像转换成为base64资源 var columnTypeImg = $("#baseData").attr('src'); coverDesc = coverDesc.replace(scriptDes, '') coverDesc = coverDesc.replace(styleDes, '').trim() if(columnTypeImg == ''){ columnTypeImg="/static/image/ele1-logo.png" } var createCoverConfig = { titleStr: title , logo: columnTypeImg, elecLogo:'/static/image/new-elecfans-logo.png', arcTit: columnTypeName, coverDesc:coverDesc } createCover.init(createCoverConfig); }); $("#linkForm").unbind("click").click(function(){ location.href=$(".btRepPost a").attr('href'); }) var fid = parseInt('999'), tid = parseInt('2168577'); function setanswer(pid, from, astype,uid,message){ var as_msg = '您确认要把该回复选为“最佳答案”吗?'; if (astype == '2') { as_msg = '您确认要把该主题设置为“无满意答案”吗?'; } if (astype == '3') { as_msg = '您确认要把该主题设置为“无答案”吗?'; } if(confirm(as_msg)){ $('#modactions').attr('action','forum.php?mod=misc&action=bestanswer&tid=' + tid + '&pid=' + pid + '&from=' + from + '&astype=' + astype + '&bestanswersubmit=yes'); $('#modactions').submit(); if($("#special").val()==3){ var message_content_answer=$("."+message).text() //是问答 是特别关注、; $.ajax({ type:"get", url:"/app/api/index.php?s=Home/Message/best", data:{ uid:uid, title:"Mahout 0.9中的聚类(Clustering)工具的用法", username:'王飞云', content:message_content_answer, tid:'2168577' }, success:function(res){ } }) } } } //页面跳转的优化 if( localStorage.getItem('url_form') ){ $(".goBack").attr('href','/m/group.php'); localStorage.removeItem('url_form') } $(".img_one li").unbind("click").click(function(){ if( !localStorage.getItem('url_form') && $(".goBack").attr('href') == '/m/group.php' ){ localStorage.setItem('url_form','group_index'); } }); $(".img_list li").each(function(){ $(this).unbind("click").click(function(){ if( !localStorage.getItem('url_form') && $(".goBack").attr('href') == '/m/group.php' ){ localStorage.setItem('url_form','group_index'); } }) }) /* * 移动端文章内页和论坛内页增加广告位 */ // $('.new_header').before( // '<div class="adtipbox" style="background:#d00;display:flex;padding:5px 20px;position: fixed;top: 0;left: 0;z-index: 1000;">'+ // '<a href="https://t.elecfans.com/active/10years.html?elecfans_trackid=bbs_m" target="_blank">'+ // '<p style="color:#fff;font-size:12px;">发烧友10周年庆典,豪华礼包派送!每天最高5次抽奖机会》点击抽奖《</p>'+ // '</a>'+ // '<span class="tipclose" style="background:url(/template/xinrui_iuni_mobile/touch/images/img/tipclose.png) no-repeat center right;background-size:29%;display:block;float:right;width:70px;height:40px;"></span>'+ // '</div>' // ).css({ // 'top':'52px', // 'margin-bottom':'60px' // }) // $('.container').css('margin-top','96px'); if($(".tipclose")){ $(".tipclose").unbind("click").click(function(){ $('.adtipbox').remove(); $('.new_header').css('top','0px'); $('.container').css('margin-top','44px'); }) } //评论的点赞或者取消点赞 $(".com_dianzan").unbind("click").click(function(e){ var pid = $(this).attr("data-pid"); var tid = $(this).attr("data-tid"); var that = $(this); //在app中没有登录调取app登录 if(Uid == 0 && IS_IN_APP){ if(isClicked){ isClicked = false; return false } isClicked = true; callAppFunction("h5CookieLogin") e.stopPropagation(); return false } isVerification($, "", "", function(){ $.ajax({ type:"get", url:"/m/forum.php?", data :{ 'mod': 'misc', 'action' : 'postreview', 'do' : "support", 'pid' : pid, "tid" : tid, 'hash' : $("#hash").val(), 'ajaxdata' : 'json' }, complete:function(res){ if( res.status == 200 && res.readyState == 4 ){ var data = JSON.parse(res.responseText); if(data.data.status == "successed" ){ if( data.data.code == 1 ){ var d_num = that.find('i').text()/1 + 1/1 ; that.html('<img src="https://staticbbs.elecfans.com/template/xinrui_iuni_mobile/touch/images/img/d_zan_on.png"/><i style="color:#d00;">'+d_num+'</i>') }else if( data.data.code == 2 ){ var d_num = that.find('i').text()/1 - 1/1 ; if( d_num == 0 ) d_num = '' that.html('<img src="https://staticbbs.elecfans.com/template/xinrui_iuni_mobile/touch/images/img/d_zan.png"/><i>'+d_num+'</i>') } }else{ if(data.message == "您需要先登录才能继续本操作"){ location.href = '/m/member.php?mod=logging&action=login' } } } } }); }) }) var Uid = $("#Uid").val(); /*关注别人或者取消*/ $(".attention_btn").unbind("click").click(function(e){ if( Uid != 0 ){ var getId = $(this).attr('data-uid'); var that = $(this); if( $(this).text() == "关注"){ $.ajax({ type:"get", url:"/infocenter.php?", data :{ 'mod': 'spacecp', 'ac' : 'follow', 'do' : "support", "op" : "add" , 'fuid' : getId, 'hash' : $("#hash").val(), 'ajaxdata' : 'json' }, complete:function(res){ if( res.status == 200 && res.readyState == 4 ){ var data = JSON.parse(res.responseText); that.removeClass("attention_no"); that.text('已关注') } } }); }else if( $(this).text() == "已关注" ){ $.ajax({ type:"get", url:"/infocenter.php?", data :{ 'mod': 'spacecp', 'ac' : 'follow', "op" : "del", 'fuid' : getId, 'hash' : $("#hash").val(), 'ajaxdata' : 'json' }, complete:function(res){ if( res.status == 200 && res.readyState == 4 ){ var data = JSON.parse(res.responseText); that.addClass("attention_no"); that.text('关注') } } }); } }else{ location.href = '/m/member.php?mod=logging&action=login' } }); if( ($(".postListHd").text()).replace(/[^0-9]/ig,"") < 5 ){ $(".tle_w_More").hide(); } if( $(".attention_btn").attr('data-uid') == Uid ){ $(".attention_btn").hide(); } //加载更多 var t_page = 1; var url = location.href; $(".tle_w_More").unbind("click").click(function(){ var that = $(this); if( that.text() == '已显示全部回帖'){ return false; } that.html('正在加载......') t_page ++ $.ajax({ url: url, data : { ajaxpage: 1, page : t_page, ajaxdata : 'json' } , success:function(res){ $(".postlist").append(res); $('.report-btn').unbind('click').click(function () { const that = $(this); if ($('.h_avatar').find('.no-login').length > 0) { window.location.href = '/member.php?mod=logging&action=login' return false } layer.open({ type: 1, title: '举报', btn: ['确定'], area: ['100%', 'auto'], skin: 'report-dialog', content: CommentHTML, yes: function (index, lay) { const message = $('input[name="drone"]:checked').val() == '其它' ? $('.public-comment__report-textarea textarea').val() : $('input[name="drone"]:checked').val(); const rid = that.attr('data-pid'); const fid = that.attr('data-fid'); const referer = window.location.href if (!message) { layer.msg('请选择举报理由'); return false; }; $.ajax({ url: '/misc.php?mod=report&message='+message+'&referer='+referer+'&reportsubmit=2&rtype=post&inajax=1&rid='+ rid +'&fid='+fid, type: 'post', success: function(result) { if(result.error_code !== 0) { layer.msg(result.msg); } else { layer.msg(result.msg); layer.close(index); } } }) return false; } }); }); that.html('<p>更多回帖<img src="https://staticbbs.elecfans.com/template/xinrui_iuni_mobile/touch/images/img/d_more.png"/></p>') /* 点击回复显示评论框 */ $(".show-replay").each(function(index){ $(this).unbind("click").click(function(e){ if( Uid != 0 ){ // 关闭入口-跳转通告页面 var post_close_state = "0" || 1, G_fid = "999" || 0, open_fid_val = "1700,1702,1685,1703,640" || "", open_fid_arr = open_fid_val.split(","); if (post_close_state == 0 || open_fid_arr.indexOf(G_fid) != -1) { // console.log('可以评论') } e.stopPropagation(); $("[name=pid]").val($(this).attr('data-pid')); askType =0; fcid = 0 ; topid = 0; conmment_index = index; $('.reply_form').show(); $(".reply_tiezhi").hide(); $('.reply-form').find('.reply-ipt').focus(); $('.reply-form').find('.reply-ipt').attr('placeholder',originFont); } }); }); }, error:function(){ $(".tle_w_More").html('已显示全部回帖'); } }) }) var coVal = ''; //获取cookie的value var isCoBack = false; //cookie是否有返回 if(ua.indexOf("appandroid")!=-1 || ua.indexOf("appios") !=-1){ $.get('/m/forum.php?mod=api_user&action=getauthsign',function(res){ isCoBack = true if(res.data){ coVal = res.data.content; } }) } $(document).off('click', '.compulsoryPop').on('click', '.compulsoryPop', function(){ var hrefValue = $(this).attr("href"); if(ua.indexOf("appandroid")!=-1 || ua.indexOf("appios") !=-1){ if($("#Uid").val()== 0 || $("#Uid").val()== ''){ if(isClicked){ isClicked = false; return false } isClicked = true; callAppFunction("h5CookieLogin") return false } } //完善资料 var pop_this = this $(this).addClass('compulsoryPop') if (isPerfectInfo($, pop_this)) { var downUrl = $(this).attr("href"); var dObj = window.location.origin + '/m/' + downUrl +'&auth_sign='+coVal; if(ua.indexOf("appandroid")!=-1 || ua.indexOf("appios") !=-1){ if(!isCoBack){return false} if (ua.indexOf("appios") > -1) { try{ console.log('in util is ios') window.webkit.messageHandlers['H5OpenBrowser'].postMessage(dObj) }catch(e){ console.log(e) } } else { try{ console.info('in util is android', window.quotationSystem, 'H5OpenBrowser') window.quotationSystem['H5OpenBrowser'](dObj) }catch(e){ console.log(e) } } return false } } else { return false; } }) if($("#hm").val()){ $('.goBack').unbind("click").click(function(){ if(history.length > 2){ history.back(-1); }else { location.href = '/m/hm_default.php' } return false; }) } /** * 举报文本 */ const CommentHTML = '<div class="public-comment__report-content">\ <div class="public-comment__report-tips">请点击举报理由</div>\ <div class="public-comment__report-radios">\ <label>\ <input type="radio" name="drone" value="广告垃圾" />\ <span>广告垃圾</span>\ </label >\ <label>\ <input type="radio" name="drone" value="违规内容" />\ <span>违规内容</span>\ </label>\ <label>\ <input type="radio" name="drone" value="恶意评论" />\ <span>恶意评论</span>\ </label>\ <label>\ <input type="radio" name="drone" value="重复内容" />\ <span>重复内容</span>\ </label>\ <label>\ <input type="radio" name="drone" value="其它" />\ <span>其它</span>\ </label>\ </div>\ <div class="public-comment__report-textarea">\ <span class="static"><i>0</i>/200</span>\ <textarea maxlength="200" placeholder="请填写举报内容"></textarea>\ </div>\ </div>'; /** * 举报按钮 */ $(document).delegate('.public-comment__report-textarea textarea', 'propertychange input', function () { let value = $(this).val(); $(this).siblings('.static').find('i').text(value.length); }); $(document).on('click', 'input[name="drone"]', function () { const val = $(this).val(); if (val == '其它') { $('.public-comment__report-textarea').show(); } else { $('.public-comment__report-textarea').hide(); } }); $('.report-btn').unbind('click').click(function (e) { const that = $(this); //在app中没有登录调取app登录 if(Uid == 0 && IS_IN_APP){ if(isClicked){ isClicked = false; return false } isClicked = true callAppFunction("h5CookieLogin") e.stopPropagation(); return false } if ($('.h_avatar').find('.no-login').length > 0) { window.location.href = '/member.php?mod=logging&action=login' return false } layer.open({ type: 1, title: '举报', btn: ['确定'], area: ['100%', 'auto'], skin: 'report-dialog', content: CommentHTML, yes: function (index, lay) { const message = $('input[name="drone"]:checked').val() == '其它' ? $('.public-comment__report-textarea textarea').val() : $('input[name="drone"]:checked').val(); const rid = that.attr('data-pid'); const fid = that.attr('data-fid'); const referer = window.location.href if (!message) { layer.msg('请选择举报理由'); return false; }; $.ajax({ url: '/misc.php?mod=report&message='+message+'&referer='+referer+'&reportsubmit=2&rtype=post&inajax=1&rid='+ rid +'&fid='+fid, type: 'post', success: function(result) { if(result.error_code !== 0) { layer.msg(result.msg); } else { layer.msg(result.msg); layer.close(index); } } }) return false; } }); }); var adIfUrl = ""; var locationId = ""; if (window.location.hostname == 'bbs.elecfans.net') { adIfUrl = "https://advert_server.elecfans.net"; locationId = "114"; } else if (window.location.hostname == 'uat-bbs.elecfans.com') { adIfUrl = "https://uat-advert-server.elecfans.com"; locationId = "93"; } else { adIfUrl = "https://advert-server.elecfans.com"; locationId = "77"; } //获取广告系统的广告 getAdvertFn() function getAdvertFn() { var obj = { platform: 'elecfans', location: locationId, user_id: "0" || '0', status: '1', page: '1', limit: '100', url: window.location.href, } $.ajax({ url: adIfUrl + '/api/advert/getListByPlaAndLoc', type: 'POST', data: obj, success: function success(res) { if (res.code == 0) { var data = res.data.data; var len = data.length if (len > 0) { var key = Math.random() * 10; key = parseInt(key)%len $(".advertWrap a").attr("href",data[key].extras?.open_url).attr("data-id",data[key].id) $(".advertWrap img").attr("src",data[key].url) $(".advertWrap").show(); } else { $(".advertWrap").hide(); } } }, error: function error(err) { console.log(err); } }); } //记录广告位置点击 $(".advertWrap a").click(function(){ var para = { advert_id: $(this).attr("data-id"), button: 1, // 广告上报 pc默认传 1 user_id: "0" || '0', device_id: navigator.userAgent, url: decodeURIComponent(window.location.href) }; $.ajax({ url: adIfUrl + '/api/advert/log', type: 'POST', data: para, }) }) </script> <!-- footer --> <script> //问答新的百度统计 </script> <!--查看--> <div class="go_elecfans flex bg_white" style="display: none;"> <a href="https://dfm.elecfans.com/viewer/?from=elec_h5" target="_blank" rel="noopener noreferrer" class="flex"> <!-- <div class="elc_logo"> <img src="/static/images/newdetail/elecfans_logo.png" alt=""> </div> --> <div class="ad-contxt"> <h3><span class="c_red">20万+</span>工程师都在用,<span class="c_red">免费</span>PCB检查工具</h3> <p>无需安装、支持浏览器和手机在线查看、实时共享</p> </div> <button>查看</button> </a> </div> <div id="mask" style="display:none;"></div> <div class="new-footer hideInApp"> <div class="flex-center"><a href="https://www.elecfans.com/app/download.html">电子发烧友APP</a></div> <div class="flex-center"> <div class="login-bottom"> <a href="/member.php?mod=logging&action=login" title="登录">登录</a><a href="/member.php?mod=reg" title="注册">注册</a> </div> <span class="line">|</span><a href="https://m.elecfans.com/about/tousu.html">投诉反馈</a><span class="line">|</span><a href="https://author.baidu.com/home/1563378682824805?from=dusite_artdetailh5">电子发烧友网</a></div> <div class="flex-center">© 2021 bbs.elecfans.com</div> <div class="flex-center"><a href="http://beian.miit.gov.cn/">湘ICP备2023018690号</a></div> </div> <nav id="mainNv" class="mainNv"> <div class="menuWarp"> <div class="userInfo cfix"> <a href="/member.php?mod=logging&action=login"> <div class="avatar fl"><img src="https://avatar.elecfans.com/uc_server/data/avatar/000/00/00/00_avatar_big.jpg" onerror="this.onerror=null;this.src='https://avatar.elecfans.com/uc_server/images/noavatar_big.gif'" /></div> <h3>点击登录</h3> <p>登录更多精彩功能!</p> </a> </div> <ul> <li class="nv1"><a href="/m/">首页</a></li> <li class="nv2"><a href="/m/forum.php">论坛版块</a></li> <li class="nv2"><a href="/m/group">小组</a></li> <li class="nv3"><a href="/try.html" title="免费开发板试用">免费开发板试用</a></li> <li class="nv10"><a href="https://t.elecfans.com/ebook" title="ebook">ebook</a></li> <li class="nv5"><a href="https://t.elecfans.com/live/?bbs">直播</a></li> <li class="nv6"><a href="/search.php?mod=forum">搜索</a></li> <li class="nv7"><a href="/member.php?mod=logging&action=login">登录</a></li> </ul> </div> </nav> </div> <!-- 是否完善资料代码 --> <div class="perfect_infomation_tip" style="top: 70%;"> <span class="no_tip_day3">×</span> <div class="perfect_infomation_tip_box go_perfect_btn"> <span class="tip_jifen_text">20</span> <div> <img class="tip_jifen" src="https://staticbbs.elecfans.com/static/image/tip_jifen.png"> </div> <div> 完善资料,<br>赚取积分 </div> </div> </div> <!-- 是否完善资料代码 --> </body> <!--用户完善资料弹窗插件--> <script src="https://staticbbs.elecfans.com/static/js/mobile/organizing/js/organizing.js?rJf" type="text/javascript"></script> <script src="https://staticbbs.elecfans.com/template/activity_201701/public/js/layer/layer.js?rJf" type="text/javascript" type="text/javascript"></script> <script src="https://staticbbs.elecfans.com/static/js/mobile/common/request.js?rJf" type="text/javascript"></script> <script type="text/javascript"> var scrollTimer var isClicked = false; //是否調取過app登錄 $(window).on("scroll",function(){ //滚动的时候悬浮缩回去 否则正常展示 $(".perfect_infomation_tip_box").css("right","0px") clearTimeout(scrollTimer); scrollTimer=setTimeout(function(){ $(".perfect_infomation_tip_box").css("right"," -70px"); },300) }) /*//判断当天是否弹出手机验证如果弹出这 //弹出是否验证手机号 然后就是完善资料 if(typeof isVerification === "function"){ if(window.localStorage.getItem("m_bbs_verification")!==newDate_current()){ //弹出是否手机验证 is_phone_verification_bbs(function(){ isPerfectInfo($, document,false,false,true) }) } }*/ //在微信中使用微信自己的分享 var setWeixinShare={};//定义默认的微信分享信息,页面如果要自定义分享,直接更改此变量即可 if(window.location.href.indexOf('/prize.html')>0){ /*抽奖页面获取tid*/ setWeixinShare.link='https://bbs.elecfans.com/prize.html?tid=0'; } if(window.navigator.userAgent.toLowerCase().match(/MicroMessenger/i) == 'micromessenger'){ var d={ title:"Mahout 0.9中的聚类(Clustering)工具的用法 - PLC - 电子技术论坛",//标题 desc:$('[name=description]').attr("content"), //描述 imgUrl:'https://bbs.elecfans.com/static/image/common/elec_logo.jpg',// 分享图标,默认是logo link:'',//链接 type:'',// 分享类型,music、video或link,不填默认为link dataUrl:'',//如果type是music或video,则要提供数据链接,默认为空 success:'', // 用户确认分享后执行的回调函数 cancel:''// 用户取消分享后执行的回调函数 } setWeixinShare=$.extend(d,setWeixinShare); $.ajax({ url:"/app/api/index.php?s=Home/Wechat/shareUrl", data:"share_url="+encodeURIComponent(location.href)+"&format=jsonp", type:'get', dataType:'jsonp', success:function(res){ if(res.status!="successed"){ return false; } $.getScript('https://res.wx.qq.com/open/js/jweixin-1.0.0.js',function(result,status){ if(status!="success"){ return false; } var getWxCfg=res.data; wx.config({ //debug: true, // 开启调试模式,调用的所有api的返回值会在客户端alert出来,若要查看传入的参数,可以在pc端打开,参数信息会通过log打出,仅在pc端时才会打印。 appId:getWxCfg.appId, // 必填,公众号的唯一标识 timestamp:getWxCfg.timestamp, // 必填,生成签名的时间戳 nonceStr:getWxCfg.nonceStr, // 必填,生成签名的随机串 signature:getWxCfg.signature,// 必填,签名,见附录1 jsApiList:['onMenuShareTimeline','onMenuShareAppMessage','onMenuShareQQ','onMenuShareWeibo','onMenuShareQZone'] // 必填,需要使用的JS接口列表,所有JS接口列表见附录2 }); wx.ready(function(){ //获取“分享到朋友圈”按钮点击状态及自定义分享内容接口 wx.onMenuShareTimeline({ title: setWeixinShare.title, // 分享标题 link: setWeixinShare.link, // 分享链接 imgUrl: setWeixinShare.imgUrl, // 分享图标 success: function () { setWeixinShare.success; // 用户确认分享后执行的回调函数 }, cancel: function () { setWeixinShare.cancel; // 用户取消分享后执行的回调函数 } }); //获取“分享给朋友”按钮点击状态及自定义分享内容接口 wx.onMenuShareAppMessage({ title: setWeixinShare.title, // 分享标题 desc: setWeixinShare.desc, // 分享描述 link: setWeixinShare.link, // 分享链接 imgUrl: setWeixinShare.imgUrl, // 分享图标 type: setWeixinShare.type, // 分享类型,music、video或link,不填默认为link dataUrl: setWeixinShare.dataUrl, // 如果type是music或video,则要提供数据链接,默认为空 success: function () { setWeixinShare.success; // 用户确认分享后执行的回调函数 }, cancel: function () { setWeixinShare.cancel; // 用户取消分享后执行的回调函数 } }); //获取“分享到QQ”按钮点击状态及自定义分享内容接口 wx.onMenuShareQQ({ title: setWeixinShare.title, // 分享标题 desc: setWeixinShare.desc, // 分享描述 link: setWeixinShare.link, // 分享链接 imgUrl: setWeixinShare.imgUrl, // 分享图标 success: function () { setWeixinShare.success; // 用户确认分享后执行的回调函数 }, cancel: function () { setWeixinShare.cancel; // 用户取消分享后执行的回调函数 } }); //获取“分享到腾讯微博”按钮点击状态及自定义分享内容接口 wx.onMenuShareWeibo({ title: setWeixinShare.title, // 分享标题 desc: setWeixinShare.desc, // 分享描述 link: setWeixinShare.link, // 分享链接 imgUrl: setWeixinShare.imgUrl, // 分享图标 success: function () { setWeixinShare.success; // 用户确认分享后执行的回调函数 }, cancel: function () { setWeixinShare.cancel; // 用户取消分享后执行的回调函数 } }); //获取“分享到QQ空间”按钮点击状态及自定义分享内容接口 wx.onMenuShareQZone({ title: setWeixinShare.title, // 分享标题 desc: setWeixinShare.desc, // 分享描述 link: setWeixinShare.link, // 分享链接 imgUrl: setWeixinShare.imgUrl, // 分享图标 success: function () { setWeixinShare.success; // 用户确认分享后执行的回调函数 }, cancel: function () { setWeixinShare.cancel; // 用户取消分享后执行的回调函数 } }); }); }); } }); } var ua = navigator.userAgent; ua = ua.toLocaleLowerCase(); var IS_IN_APP = ua.indexOf("appandroid")>-1 || ua.indexOf("appios")>-1; var isIos = ua.indexOf("appios") > -1; // 在app中隐藏对应模块 if(IS_IN_APP){ $(".hideInApp").hide(); $(".simple-about2").css({"paddingBottom":"2rem"}) } //点击触发app登录 function callAppFunction(funcName,p1=null) { if (IS_IN_APP) { if (ua.indexOf("appios") > -1) { try{ console.log('in util is ios') window.webkit.messageHandlers[funcName].postMessage(p1) }catch(e){ console.log(e) } } else { try{ console.info('in util is android', window.quotationSystem, funcName) window.quotationSystem[funcName]() }catch(e){ console.log(e) } } } } $(".inAppLogin").on("click", function (ev) { var url = $(this).attr("href") console.log(IS_IN_APP, isIos, url, navigator.userAgent); if (IS_IN_APP) { if (parseInt('0') == 0) { callAppFunction("h5CookieLogin") return false; } else { if (isIos) { window.location.href = url; } else { url && window.open(url) } } } else { if (isIos) { window.location.href = url; } } }); </script> </html>