- 浏览: 866027 次
- 性别:
- 来自: 北京
文章分类
最新评论
-
yaawaas:
有了 。。
程序员装B指南 -
niepeng880208:
藐视我还是初级装
程序员装B指南 -
sa364867195:
哥们,现在实现了吗?
GPS卫星定位车辆监控系统 -
月下小人:
顶,这才是质的提高,果断顶
一个基于jQuery ajax和.net httphandler 的超轻异步框架,千行代码完成。 -
xp9802:
有意思,动静结合
程序员装B指南
向量空间模型文档相似度计算实现(C#)
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;">读者可以根据自己的需要进行加壳或改写,本文权当抛砖引玉。</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;">笔者加的壳在:</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;"><span style="text-decoration: underline;"><span style="color: #800080;"><a href="http://download.csdn.net/source/1143450">http://download.csdn.net/source/1143450</a></span></span><a href="http://download.csdn.net/source/1143450"></a></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;">VSM模型介绍:</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;"><span style="color: #0000ff;"><a href="http://blog.csdn.net/Felomeng/archive/2009/03/25/4024078.aspx">http://blog.csdn.net/Felomeng/archive/2009/03/25/4024078.aspx</a></span><a href="http://blog.csdn.net/Felomeng/archive/2009/03/25/4023944.aspx"></a></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Collections</span>.<span style="color: #010001;">Generic</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Linq</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Text</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Text</span>.<span style="color: #010001;">RegularExpressions</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">namespace</span><span style=""> <span style="color: #010001;">Felomeng</span>.<span style="color: #010001;">VSMSimilarity</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;">{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">class</span> <span style="color: #2b91af;">SVMModle</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">降维词表</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">private</span> <span style="color: #2b91af;">List</span><<span style="color: blue;">string</span>> <span style="color: #010001;">reducingKeys</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">List</span><<span style="color: blue;">string</span>>();</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">构造函数:使用降维表</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="reducingKeys"></span><span style="color: green;" lang="ZH-CN">降维词表</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: #010001;">SVMModle</span>(<span style="color: #2b91af;">List</span><<span style="color: blue;">string</span>> <span style="color: #010001;">reducingKeys</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">this</span>.<span style="color: #010001;">reducingKeys</span> = <span style="color: #010001;">reducingKeys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">构造函数:不使用降维表</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: #010001;">SVMModle</span>()</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">相似度计算</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="text1"></span><span style="color: green;" lang="ZH-CN">文档1(分好词的,分词符为非汉字字符)</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="text2"></span><span style="color: green;" lang="ZH-CN">文档2(分好词的,分词符为非汉字字符)</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><returns></span><span style="color: green;" lang="ZH-CN">两篇文章的相似度</span><span style="color: gray;"></returns></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: blue;">double</span> <span style="color: #010001;">Similarity</span>(<span style="color: blue;">string</span> <span style="color: #010001;">text1</span>, <span style="color: blue;">string</span> <span style="color: #010001;">text2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">double</span> <span style="color: #010001;">similarity</span> = 0.0, <span style="color: #010001;">numerator</span> = 0.0, <span style="color: #010001;">denominator1</span> = 0.0, <span style="color: #010001;">denominator2</span> = 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">int</span> <span style="color: #010001;">temp1</span>, <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">dictionary1</span> = <span style="color: #010001;">GetDictionary</span>(<span style="color: #010001;">text1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">dictionary2</span> = <span style="color: #010001;">GetDictionary</span>(<span style="color: #010001;">text2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> ((<span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Count</span> < 1) || (<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Count</span> < 1))<span style="color: green;">//<span lang="ZH-CN">如果任一篇文章中不含有汉字</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>>.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys1</span> = <span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys1</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> (!<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>))</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">temp2</span> = 0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style=""></span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Remove</span>(<span style="color: #010001;">key</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">numerator</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator1</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp1</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>>.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys2</span> = <span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">similarity</span> = <span style="color: #010001;">numerator</span> / (<span style="color: #2b91af;">Math</span>.<span style="color: #010001;">Sqrt</span>(<span style="color: #010001;">denominator1</span> * <span style="color: #010001;">denominator2</span>));</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style=""></span><span style="color: blue;">return</span> <span style="color: #010001;">similarity</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">相似度计算</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="text1"></span><span style="color: green;" lang="ZH-CN">第一篇文档的词频词典</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="text2"></span><span style="color: green;" lang="ZH-CN">第二篇文档的词频词典</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><returns></span><span style="color: green;" lang="ZH-CN">两篇文档的相似度</span><span style="color: gray;"></returns></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: blue;">double</span> <span style="color: #010001;">Similarity</span>(<span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">text1</span>, <span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">text2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">double</span> <span style="color: #010001;">similarity</span> = 0.0, <span style="color: #010001;">numerator</span> = 0.0, <span style="color: #010001;">denominator1</span> = 0.0, <span style="color: #010001;">denominator2</span> = 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">int</span> <span style="color: #010001;">temp1</span>, <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">dictionary1</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>,<span style="color: blue;">int</span>>( <span style="color: #010001;">text1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">dictionary2</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>,<span style="color: blue;">int</span>>( <span style="color: #010001;">text2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> ((<span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Count</span> < 1) || (<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Count</span> < 1))<span style="color: green;">//<span lang="ZH-CN">如果任一篇文章中不含有汉字</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>>.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys1</span> = <span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys1</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> (!<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>))</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">temp2</span> = 0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Remove</span>(<span style="color: #010001;">key</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">numerator</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator1</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp1</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>>.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys2</span> = <span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">similarity</span> = <span style="color: #010001;">numerator</span> / (<span style="color: #2b91af;">Math</span>.<span style="color: #010001;">Sqrt</span>(<span style="color: #010001;">denominator1</span> * <span style="color: #010001;">denominator2</span>));</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> <span style="color: #010001;">similarity</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">统计文档词频词典</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="text"></span><span style="color: green;" lang="ZH-CN">已分词文档,分隔符为非汉语字符</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><returns></span><span style="color: green;" lang="ZH-CN">该文档词频词典</span><span style="color: gray;"></returns></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">GetDictionary</span>(<span style="color: blue;">string</span> <span style="color: #010001;">text</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">dictionary</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>>();</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Regex</span> <span style="color: #010001;">regex</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Regex</span>(<span style="color: #a31515;">@"[\u4e00-\u9fa5]+"</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">MatchCollection</span> <span style="color: #010001;">results</span> = <span style="color: #010001;">regex</span>.<span style="color: #010001;">Matches</span>(<span style="color: #010001;">text</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">int</span> <span style="color: #010001;">temp</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: #2b91af;">Match</span> <span style="color: #010001;">word</span> <span style="color: blue;">in</span> <span style="color: #010001;">results</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> (<span style="color: #010001;">dictionary</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp</span>))</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">temp</span>++;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary</span>.<span style="color: #010001;">Remove</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary</span>.<span style="color: #010001;">Add</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>, <span style="color: #010001;">temp</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">else</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary</span>.<span style="color: #010001;">Add</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>, 1);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> <span style="color: #010001;">dictionary</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;">}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;">还有很多可以优化的地方,大家多加思考。如果能够得到适当优化的话,速度还能提高很多。</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 10pt;"><span style="font-size: 6pt; line-height: 115%;"><span style="font-family: Calibri;"><span style="font-size: small;"></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;">笔者加的壳在:</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;"><span style="text-decoration: underline;"><span style="color: #800080;"><a href="http://download.csdn.net/source/1143450">http://download.csdn.net/source/1143450</a></span></span><a href="http://download.csdn.net/source/1143450"></a></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;">VSM模型介绍:</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="color: #000000;"><span style="font-size: small;"><span style="color: #0000ff;"><a href="http://blog.csdn.net/Felomeng/archive/2009/03/25/4024078.aspx">http://blog.csdn.net/Felomeng/archive/2009/03/25/4024078.aspx</a></span><a href="http://blog.csdn.net/Felomeng/archive/2009/03/25/4023944.aspx"></a></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Collections</span>.<span style="color: #010001;">Generic</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Linq</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Text</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">using</span><span style=""> <span style="color: #010001;">System</span>.<span style="color: #010001;">Text</span>.<span style="color: #010001;">RegularExpressions</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style="font-size: small;"><span style="">namespace</span><span style=""> <span style="color: #010001;">Felomeng</span>.<span style="color: #010001;">VSMSimilarity</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;">{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">class</span> <span style="color: #2b91af;">SVMModle</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">降维词表</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">private</span> <span style="color: #2b91af;">List</span><<span style="color: blue;">string</span>> <span style="color: #010001;">reducingKeys</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">List</span><<span style="color: blue;">string</span>>();</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">构造函数:使用降维表</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="reducingKeys"></span><span style="color: green;" lang="ZH-CN">降维词表</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: #010001;">SVMModle</span>(<span style="color: #2b91af;">List</span><<span style="color: blue;">string</span>> <span style="color: #010001;">reducingKeys</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">this</span>.<span style="color: #010001;">reducingKeys</span> = <span style="color: #010001;">reducingKeys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">构造函数:不使用降维表</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: #010001;">SVMModle</span>()</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">相似度计算</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="text1"></span><span style="color: green;" lang="ZH-CN">文档1(分好词的,分词符为非汉字字符)</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="text2"></span><span style="color: green;" lang="ZH-CN">文档2(分好词的,分词符为非汉字字符)</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><returns></span><span style="color: green;" lang="ZH-CN">两篇文章的相似度</span><span style="color: gray;"></returns></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: blue;">double</span> <span style="color: #010001;">Similarity</span>(<span style="color: blue;">string</span> <span style="color: #010001;">text1</span>, <span style="color: blue;">string</span> <span style="color: #010001;">text2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">double</span> <span style="color: #010001;">similarity</span> = 0.0, <span style="color: #010001;">numerator</span> = 0.0, <span style="color: #010001;">denominator1</span> = 0.0, <span style="color: #010001;">denominator2</span> = 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">int</span> <span style="color: #010001;">temp1</span>, <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">dictionary1</span> = <span style="color: #010001;">GetDictionary</span>(<span style="color: #010001;">text1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">dictionary2</span> = <span style="color: #010001;">GetDictionary</span>(<span style="color: #010001;">text2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> ((<span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Count</span> < 1) || (<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Count</span> < 1))<span style="color: green;">//<span lang="ZH-CN">如果任一篇文章中不含有汉字</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>>.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys1</span> = <span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys1</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> (!<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>))</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">temp2</span> = 0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style=""></span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Remove</span>(<span style="color: #010001;">key</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">numerator</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator1</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp1</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>>.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys2</span> = <span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">similarity</span> = <span style="color: #010001;">numerator</span> / (<span style="color: #2b91af;">Math</span>.<span style="color: #010001;">Sqrt</span>(<span style="color: #010001;">denominator1</span> * <span style="color: #010001;">denominator2</span>));</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style=""></span><span style="color: blue;">return</span> <span style="color: #010001;">similarity</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">相似度计算</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="text1"></span><span style="color: green;" lang="ZH-CN">第一篇文档的词频词典</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="text2"></span><span style="color: green;" lang="ZH-CN">第二篇文档的词频词典</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><returns></span><span style="color: green;" lang="ZH-CN">两篇文档的相似度</span><span style="color: gray;"></returns></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: blue;">double</span> <span style="color: #010001;">Similarity</span>(<span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">text1</span>, <span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">text2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">double</span> <span style="color: #010001;">similarity</span> = 0.0, <span style="color: #010001;">numerator</span> = 0.0, <span style="color: #010001;">denominator1</span> = 0.0, <span style="color: #010001;">denominator2</span> = 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">int</span> <span style="color: #010001;">temp1</span>, <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">dictionary1</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>,<span style="color: blue;">int</span>>( <span style="color: #010001;">text1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">dictionary2</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>,<span style="color: blue;">int</span>>( <span style="color: #010001;">text2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> ((<span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Count</span> < 1) || (<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Count</span> < 1))<span style="color: green;">//<span lang="ZH-CN">如果任一篇文章中不含有汉字</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> 0.0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>>.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys1</span> = <span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys1</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary1</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp1</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> (!<span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>))</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">temp2</span> = 0;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Remove</span>(<span style="color: #010001;">key</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">numerator</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator1</span> += <span style="color: #010001;">temp1</span> * <span style="color: #010001;">temp1</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>>.<span style="color: #2b91af;">KeyCollection</span> <span style="color: #010001;">keys2</span> = <span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">Keys</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys2</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary2</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">key</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp2</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">denominator2</span> += <span style="color: #010001;">temp2</span> * <span style="color: #010001;">temp2</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">similarity</span> = <span style="color: #010001;">numerator</span> / (<span style="color: #2b91af;">Math</span>.<span style="color: #010001;">Sqrt</span>(<span style="color: #010001;">denominator1</span> * <span style="color: #010001;">denominator2</span>));</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> <span style="color: #010001;">similarity</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> <span lang="ZH-CN">统计文档词频词典</span></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"></summary></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><param name="text"></span><span style="color: green;" lang="ZH-CN">已分词文档,分隔符为非汉语字符</span><span style="color: gray;"></param></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: gray;">///</span><span style="color: green;"> </span><span style="color: gray;"><returns></span><span style="color: green;" lang="ZH-CN">该文档词频词典</span><span style="color: gray;"></returns></span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">public</span> <span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">GetDictionary</span>(<span style="color: blue;">string</span> <span style="color: #010001;">text</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>> <span style="color: #010001;">dictionary</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Dictionary</span><<span style="color: blue;">string</span>, <span style="color: blue;">int</span>>();</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">Regex</span> <span style="color: #010001;">regex</span> = <span style="color: blue;">new</span> <span style="color: #2b91af;">Regex</span>(<span style="color: #a31515;">@"[\u4e00-\u9fa5]+"</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #2b91af;">MatchCollection</span> <span style="color: #010001;">results</span> = <span style="color: #010001;">regex</span>.<span style="color: #010001;">Matches</span>(<span style="color: #010001;">text</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">int</span> <span style="color: #010001;">temp</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">foreach</span> (<span style="color: #2b91af;">Match</span> <span style="color: #010001;">word</span> <span style="color: blue;">in</span> <span style="color: #010001;">results</span>)</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">if</span> (<span style="color: #010001;">dictionary</span>.<span style="color: #010001;">TryGetValue</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>, <span style="color: blue;">out</span> <span style="color: #010001;">temp</span>))</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">temp</span>++;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary</span>.<span style="color: #010001;">Remove</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary</span>.<span style="color: #010001;">Add</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>, <span style="color: #010001;">temp</span>);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">else</span></span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>{</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: #010001;">dictionary</span>.<span style="color: #010001;">Add</span>(<span style="color: #010001;">word</span>.<span style="color: #010001;">Value</span>, 1);</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span><span style="color: blue;">return</span> <span style="color: #010001;">dictionary</span>;</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;"><span style=""> </span>}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;">}</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 0pt; line-height: normal;"><span style=""><span style="font-size: small;">还有很多可以优化的地方,大家多加思考。如果能够得到适当优化的话,速度还能提高很多。</span></span></p>
<p class="MsoNormal" style="margin: 0cm 0cm 10pt;"><span style="font-size: 6pt; line-height: 115%;"><span style="font-family: Calibri;"><span style="font-size: small;"></span></span></span></p>
相关推荐
利用java代码实现向量空间模型,通过词频,文档频率计算相似度的值。
1、要利用已训练过的词向量模型进行词语相似度计算,实验中采用的词向量模型已事先通过训练获取的。 2、于数据采用的是 2020 年特殊年份的数据,“疫情”是主要 话题。 3、在计算词语之间的相似度时,采用的词语与...
利用空间向量模型比较两文本的文本相似度,请自行Google并下载下载,nltk包,port stemming算法
self complement of Sentence Similarity compute based on cilin, hownet, simhash, wordvector,vsm models,基于同义词词林,知网,指纹,字词向量,向量空间模型的句子相似度计算。
Python基于同义词词林,知网,指纹,字词向量,向量空间模型的句子相似度计算项目源代码 中文句子相似度计算,目前包括word-level和sentence-level两个level的计算方法。前者的思想是通过对句子进行分词,分别计算两...
使用向量空间模型以最快速度计算文本之间的相似度,JAVA源码+数据
简单向量空间模型可用于文档相似度的计算,也可以用于检索信息,配有详细的注释
以关系向量模型为核心,提出了基于关系向量模型的句子相似度计算方法。同时将该算法应用到网络热点新闻自动摘要生成算法中,排除文摘中意思相近的句子从而避免文摘的冗余。实验结果表明,在考虑网络新闻中的句子...
该资源主要参考我的博客:word2vec词向量训练及中文文本相似度计算 http://blog.csdn.net/eastmount/article/details/50637476 其中包括C语言的Word2vec源代码(从官网下载),自定义爬取的三大百科(百度百科、互动...
词向量 词向量_中文文本相似度计算_采用text2vec词向量工具进行计算对比
针对VSM不能揭示隐藏在不同特征词后面的相同概念语义、反映文档中的潜在语义关系、在相似度计算中精度较低的问题, 提出一种基于领域本体的文档向量空间模型DOBVSM(domain ontology-based vector space model)。...
本实验实现了 5 种词汇相似度的计算方法并在 WordSimilarity-353数据集上进行了评价。 基于语义词典(WordNet)的词汇相似度计算 基于 WordNet 的词汇相似度计算结果 基于 Bing 查询结果的词汇相似度计算 基于 Bing ...
基于向量空间模型的概念检索基于向量空间模型的概念检索基于向量空间模型的概念检索
基于向量空间模型的文本自动分类系统的研究与实现.CAJ 基于向量空间模型的文本自动分类系统的研究与实现.CAJ
word2vec词向量训练及中文文本相似度计算。 word2vec是google在2013年推出的一个NLP工具,它的特点是将所有的词向量化,这样词与词之间就可以定量的去度量他们之间的关系,挖掘词之间的联系。虽然源码是开源的,但是...
向量空间模型vsm的c++和java实现,很经典。
易语言向量法计算文本相似度源码,向量法计算文本相似度,Similar_Text,初始化,生成CRC32表,取CRC32,去重复,取文本地址,指针到短整数_无符号,清零,取数组地址
获得了基于短语的浅层句法树PST(Phrase-based Shallow Tree)和基于短语的依存树PDT(Phrase-based Dependency Tree)的结构化特征,并与平面特征向量相结合,使用支持向量回归模型进行文本语义相似度计算。实验结果表明,...
向量空间模型的java实现,希望大家一起探讨
向量空间模型的构建 C++实现 VS2013上做的,绝对的好用