Jekyll2021-03-31T16:23:36+00:00https://jovany-wang.github.io/feed.xmlJovany Wang每周做一个计算机实验2021-03-30T00:00:00+00:002021-03-30T00:00:00+00:00https://jovany-wang.github.io/2021/03/30/every-expr-one-week<p class="intro">大量的实验不仅可以巩固自己的理论知识,而且有时候可能会获得意外收获。争取每周做一个很小的计算机实验。</p>
<h2 id="实验列表">实验列表</h2>
<ul>
<li>实现一个malloc</li>
<li>JNI函数调用性能测试实验(对比localsocket/pipe等方式)</li>
<li>实现一个高速缓存模拟器</li>
<li>实现SSE2版本的upper和lower并做性能测试和分析</li>
</ul>大量的实验不仅可以巩固自己的理论知识,而且有时候可能会获得意外收获。争取每周做一个很小的计算机实验。论文翻译:Virtual Consensus in Delos2021-02-22T00:00:00+00:002021-02-22T00:00:00+00:00https://jovany-wang.github.io/2021/02/22/translate-paper-virtual-consensus-in-delos-osdi<p>Mahesh Balakrishnan, Jason Flinn, Chen Shen, Mihir Dharamshi, Ahmed Jafri, Xiao Shi Santosh Ghosh, Hazem Hassan, Aaryaman Sagar, Rhed Shi, Jingming Liu, Filip Gruszczynski Xianan Zhang, Huy Hoang, Ahmed Yossef, Francois Richard, Yee Jiun Song Facebook, Inc.</p>
<h1 id="概要">概要</h1>
<p>基于一致性算法的复制系统(复制系统指那些类似于主备,多主复制等)是复杂的,臃肿的,以及一旦部署则难以升级维护的。因此,已经部署过的老系统无法从具有创新性的研究中获益,新的一致性协议也很少被投入真正的生产环境中。我们提出一种通过虚拟化共享日志API来虚拟化一致性的方案,允许用户服务在不停机的情况下更改一致性协议。虚拟化机制将一致性逻辑分为2部分,一部分是VirtualLogs,一个通用,可复用和可配置的抽象层;一部分是可插拔的排序协议,称为Loglets。</p>Mahesh Balakrishnan, Jason Flinn, Chen Shen, Mihir Dharamshi, Ahmed Jafri, Xiao Shi Santosh Ghosh, Hazem Hassan, Aaryaman Sagar, Rhed Shi, Jingming Liu, Filip Gruszczynski Xianan Zhang, Huy Hoang, Ahmed Yossef, Francois Richard, Yee Jiun Song Facebook, Inc.2021年的学习计划2021-01-05T00:00:00+00:002021-01-05T00:00:00+00:00https://jovany-wang.github.io/2021/01/05/plan-in-2021<p class="intro">每到新的一年,大家都习惯地去立一些flag,我也是。毕竟能不能完成是一码事,但是有个纲领,会使得你在新的一年里会更有目标性。</p>
<h3 id="大数据领域">大数据领域</h3>
<ul>
<li>
<p>学习SSE2指令集
向量化技术在大数据领域已经成为潮流,学习SSE2指令集是很有必要的。其实这个东西不是很难,主要是替换向量化编程思想,知道什么情况下可以使用向量化技术去优化。另外这块的学习一定是以实践为主,说白了就是多练习,什么线性代数里的运算呀,都要自己动手去实现才能体会到优化带来的快感。</p>
</li>
<li>
<p>学习CUDA编程
之前接触了一个rapids,我认为NV的未来前途一片光明,看好NV,所以从学习CUDA开始,一点一滴,先学学CUDA,后面有机会再深入CUDF和rapids。</p>
</li>
<li>
<p>学习Flink 1.x源码
这一项学习我认为不一定能完成。因为什么呢,工作中用不到这块,所以自己光看很难去探索代码的一些细节,只能大步流星的学习其思想了。当然更重要的是,需要对比Ray Streaming, Flink和Millwheel这三个项目来从宏观上学习他们的异同之处。</p>
</li>
</ul>
<h3 id="分布式领域">分布式领域</h3>
<ul>
<li>共识算法:Raft, Paxos</li>
<li>rocksdb</li>
</ul>
<h3 id="机器学习领域">机器学习领域</h3>
<ul>
<li>继续学习机器学习(台大课程和实验)
2020年也看了一些ML的课程,可以说从概念上算是一个非常基础的入门了把。由于这块仍然是未来的大热门,不学习就会被淘汰,所以还要继续深入这块的学习。21年的学习需要调整重点,不再是本着入门的态度,而是需要有一定的深度。还是继续以台大的课程入手,一方面学习课程并要求自己推导出课程的所有公式,结论;二方面,课程的习题,实践的项目理应按质量完成;第三个方面,就是需要动手实践,做一到2个小项目,可以是一些简单,不必太复杂。</li>
<li>学习pytroch
学好pytorch是我21年一个很重要的里程碑。因为只有这样才会让自己在做项目的时候可以亲力亲为,另一方面在对大数据,大规模计算方面也会有很好的上层帮助。</li>
</ul>
<h3 id="一定要做的项目">一定要做的项目</h3>
<ul>
<li>openmicro
这个项目的目标是21年发布第一个版本,在阿里公有云的k8s集群上上线,并且有小公司使用。我对这个项目还是非常有信心的,所以要告诫自己一定不要半途而废,把这件事情做好。</li>
<li>openmillwheel
这个项目的设计还没有想好,但是我想21年真的要为了兴趣做一做,因为他会帮我梳理之前在百度工作时候做的事情,同时能够让我将flink, ray streaming这三者一起对比。如果没时间,我觉得这个项目也可以暂缓。</li>
<li>metable
这个项目规划了很久,可以做一个demo出来,因为我认为这个项目还是有一定的使用场景的。</li>
</ul>
<h3 id="一定要看完的书籍">一定要看完的书籍</h3>
<ul>
<li>《现代操作系统》陈海波</li>
<li>《Streaming System》</li>
<li>《分布式实时处理系统》</li>
</ul>每到新的一年,大家都习惯地去立一些flag,我也是。毕竟能不能完成是一码事,但是有个纲领,会使得你在新的一年里会更有目标性。C++函数模板的偏特化2020-09-18T00:00:00+00:002020-09-18T00:00:00+00:00https://jovany-wang.github.io/2020/09/18/template-function-partial-sepc<p class="intro">C++标准并不支持函数模板偏特化,然而在实际开发中,我们确实需要对一些函数模板进行偏特化。本文将介绍几种函数模板偏特化的通用方案。</p>
<h2 id="1-什么是偏特化">1. 什么是偏特化</h2>
<h3 id="11-类模板偏特化">1.1 类模板偏特化</h3>
<p>偏特化是相对于全特化而言的,即只特化了部分模板参数,如下:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 类模板偏特化demo</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">Allocator_T</span><span class="p">></span>
<span class="k">class</span> <span class="nc">MyVector</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="n">MyVector</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Normal version."</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">></span>
<span class="k">class</span> <span class="nc">MyVector</span><span class="o"><</span><span class="n">T</span><span class="p">,</span> <span class="n">DefaultAllocator</span><span class="o">></span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="n">MyVector</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Partial version."</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="n">MyVector</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span> <span class="n">MyAnotherAllocator</span><span class="o">></span> <span class="n">v1</span><span class="p">;</span>
<span class="n">MyVector</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span> <span class="n">DefaultAllocator</span><span class="o">></span> <span class="n">v2</span><span class="p">;</span>
</code></pre></div></div>
<p>输出结果:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Normal version.
Normal version.
Partial version.
</code></pre></div></div>
<p>后面的一个MyVector是一个偏特化版本,其只特化了<code class="language-plaintext highlighter-rouge">Allocator_T</code>这一个模板参数为<code class="language-plaintext highlighter-rouge">DefaultAllocator</code>。通过输出结果也可以看出来,其中v1, v2使用上面的一个类定义,而v3使用的是下面的特化版的类。</p>
<h3 id="12-函数模板偏特化">1.2 函数模板偏特化</h3>
<p>和类模板偏特化同样的道理,我们尝试去对一个函数进行偏特化:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">/// 函数模板偏特化demo</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">A</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">B</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">A</span> <span class="n">a</span><span class="p">,</span> <span class="n">B</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Normal version."</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">A</span><span class="p">></span>
<span class="kt">void</span> <span class="n">f</span><span class="o"><</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span><span class="o">></span><span class="p">(</span><span class="n">A</span> <span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Partial version."</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// 测试代码</span>
<span class="kt">int</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">12</span><span class="p">;</span>
<span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
<span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
</code></pre></div></div>
<p>这段代码的意图很简单,就是期望在调<code class="language-plaintext highlighter-rouge">f()</code>的时候,如果第二个参数是int,就走到下面一个偏特化版本的<code class="language-plaintext highlighter-rouge">f()</code>里。然后这段代码编译会出现如下错误:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>func_partial_demo.cc:9:6: error: <span class="k">function </span>template partial specialization is not allowed
void f<A, int><span class="o">(</span>A a, int b<span class="o">)</span> <span class="o">{</span>
^~~~~~~~~
1 error generated.
</code></pre></div></div>
<p>编译器给出的错误信息也很明显,就是说我不支持函数模板的偏特化。</p>
<h2 id="2-实现方案">2. 实现方案</h2>
<p>但事实上前面也提到,这种函数模板偏特化的需求其实在实际开发中非常常见,因此我们需要使用一些技巧达到对函数模板进行偏特化的目的。</p>
<h3 id="21-借助类模板偏特化">2.1 借助类模板偏特化</h3>
<p>由于类可以进行偏特化处理,因此一种非常直观的方案就是使用Functor代替函数,并实现<code class="language-plaintext highlighter-rouge">operator()</code>。</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// 借助类模板偏特化demo
template <typename A, typename B>
class F {
public:
F(A a, B b): a_(a), b_(b) {}
void operator() () {
// 使用a_, b_作为函数的参数
std::cout << "Normal version." << std::endl;
}
private:
A a_;
B b_;
};
template <typename A>
class F<A, int> {
public:
F(A a, int b): a_(a), b_(b) {}
void operator() () {
// 使用a_, b_作为函数的参数
std::cout << "Partial version." << std::endl;
}
private:
A a_;
int b_;
};
// 测试代码
int a = 10;
double b = 12;
F<int, double>(a, b)(); // 输出 Normal version.
F<int, int>(a, a)(); // 输出 Partial version.
</code></pre></div></div>
<p>当然这里你不去实现<code class="language-plaintext highlighter-rouge">operator()</code>方法其实问题也不大,你可以继续使用<code class="language-plaintext highlighter-rouge">f</code>作为方法名,然后调用的时候调用该对象的f方法即可。</p>
<h3 id="22-使用标签分发">2.2 使用标签分发</h3>
<p>C++标准虽然不支持函数模板的偏特化,但函数的重载显然是支持的。使用标签分发(Tag Dispatch)的方案就是通过函数实现不同的函数重载实现,根据不同实参类型选择具体的函数实现,以达到函数模板偏特化的实现。</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 标签分发demo</span>
<span class="k">struct</span> <span class="nc">NormalVersionTag</span> <span class="p">{};</span>
<span class="k">struct</span> <span class="nc">IntPartialVersionTag</span> <span class="p">{};</span>
<span class="k">template</span><span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span> <span class="k">struct</span> <span class="nc">TagDispatchTrait</span> <span class="p">{</span>
<span class="k">using</span> <span class="n">Tag</span> <span class="o">=</span> <span class="n">NormalVersionTag</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">template</span><span class="o"><</span><span class="p">></span>
<span class="k">struct</span> <span class="nc">TagDispatchTrait</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="p">{</span>
<span class="k">using</span> <span class="n">Tag</span> <span class="o">=</span> <span class="n">IntPartialVersionTag</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">A</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">B</span><span class="p">></span>
<span class="kr">inline</span> <span class="kt">void</span> <span class="nf">internal_f</span><span class="p">(</span><span class="n">A</span> <span class="n">a</span><span class="p">,</span> <span class="n">B</span> <span class="n">b</span><span class="p">,</span> <span class="n">NormalVersionTag</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Normal version."</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">A</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">B</span><span class="p">></span>
<span class="kr">inline</span> <span class="kt">void</span> <span class="nf">internal_f</span><span class="p">(</span><span class="n">A</span> <span class="n">a</span><span class="p">,</span> <span class="n">B</span> <span class="n">b</span><span class="p">,</span> <span class="n">IntPartialVersionTag</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Partial version."</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">A</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">B</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">A</span> <span class="n">a</span><span class="p">,</span> <span class="n">B</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">internal_f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="k">typename</span> <span class="n">TagDispatchTrait</span><span class="o"><</span><span class="n">B</span><span class="o">>::</span><span class="n">Tag</span> <span class="p">{});</span>
<span class="p">}</span>
<span class="c1">// 测试代码</span>
<span class="kt">int</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">12</span><span class="p">;</span>
<span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
<span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
</code></pre></div></div>
<p>上述测试代码输出结果为:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Normal version.
Partial version.
</code></pre></div></div>
<p>可以看到这种方案是利用函数重载的特性以达到根据实参类型筛选不同函数实现的能力。我们将这种实现称为标签分发。</p>
<h3 id="23-使用concepts">2.3 使用Concepts</h3>
<p>C++20提供了<code class="language-plaintext highlighter-rouge">Concepts</code>特性,<code class="language-plaintext highlighter-rouge">Concepts</code>特性提出的动机是为了解决模板元编程过程中,编译器给出的报错信息冗余及编译器不能很好的给出准确的出错信息的问题。你可以简单的理解为<code class="language-plaintext highlighter-rouge">Concepts</code>就是在模板元编程过程中需要用户手动打的hints,来帮助编译器知道你在元编程过程中的想法,进而可以更好地给你提供准确的信息。下面看下,如何利用<code class="language-plaintext highlighter-rouge">Concepts</code>轻松地实现该能力。</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">A</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">B</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">A</span> <span class="n">a</span><span class="p">,</span> <span class="n">B</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Normal version."</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">A</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">B</span><span class="p">></span>
<span class="n">requires</span> <span class="n">std</span><span class="o">::</span><span class="n">integral</span><span class="o"><</span><span class="n">B</span><span class="o">></span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">(</span><span class="n">A</span> <span class="n">a</span><span class="p">,</span> <span class="n">B</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Partial version."</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// 测试代码</span>
<span class="kt">int</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">10</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">b</span> <span class="o">=</span> <span class="mi">12</span><span class="p">;</span>
<span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
<span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
</code></pre></div></div>
<p>毫无疑问上述的输出结果还是和之前实现的一样,符合预期。其中对于偏特化的版本其requires B类型为int类型,所以在f(a, a)的调用中,编译器生成且直接匹配到这一个偏特化的版本。不过再次提醒的是,<code class="language-plaintext highlighter-rouge">Concepts</code>特性是C++20才支持的特性。</p>
<h2 id="3-总结与思考">3. 总结与思考</h2>
<h3 id="31-总结">3.1 总结</h3>
<p>上述我们提到的三种不同的实现其实都是有各自的优缺点,第一种使用类偏特化的实现优点在于逻辑清晰,传统的C++程序员都能够轻易的理解和实现。第二种使用标签分发的方案实际上是利用函数重载达到函数模板偏特化的效果,实现上有一点绕弯,但<code class="language-plaintext highlighter-rouge">标签分发</code>的方案是C++标准委员会推荐的一种方法,所以以前在一段时间内开发者所使用的方案。第三种<code class="language-plaintext highlighter-rouge">Concepts</code>的方案是依赖于C++20,这种方案代码最为简洁和直观,从C++原语上提供了编译器类型要求和类型选择的能力。毫无疑问,未来随着C++20的普及和广泛使用,<code class="language-plaintext highlighter-rouge">Concepts</code>将是解决这类问题的通用方案。</p>
<p><a href="https://github.com/jovany-wang/dousi/blob/307426a48d3aeabaf4920325f58d917b326c5096/core/src/core/submitter/service_handle.h#L82">https://github.com/jovany-wang/dousi/blob/307426a48d3aeabaf4920325f58d917b326c5096/core/src/core/submitter/service_handle.h#L82</a><br />
另外这个链接给出了一个使用标签分发实现的函数模板偏特化的实际开发例子。其中调用的<code class="language-plaintext highlighter-rouge">InternalCaller()</code>时会根据最后一个参数tag进行标签选择不同的实现版本。</p>
<h3 id="32-思考">3.2 思考</h3>
<p>通过前面我们了解到函数模板不能直接被偏特化,那么到底为什么标准C++不支持函数模板偏特化呢?简而言之是因为模板特化版本不参与函数的重载抉策过程,因此在和函数重载一起使用的时候,可能出现不符合预期的结果。因此标准C++禁止了函数模板的偏特化。那么有人可能提出疑问,既然C++从语法上就禁止使用函数模板的偏特化,那么为何我们还去做这件事情,岂不是矛盾?其实仔细思考,是不矛盾的。C++禁止的原因是在于函数模板偏特化和函数重载决策的矛盾,而我们在上述的几种实现方案中,都很显式地避开了函数重载的问题。方案1中使用的是类模板偏特化,没有函数重载问题;方案2中使用的就是函数重载本身来作为决策依据;而方案3中,<code class="language-plaintext highlighter-rouge">Concpets</code>使用在函数模板之上,本身就是利用<code class="language-plaintext highlighter-rouge">Concepts</code>实现函数的重载,即该过程本身是一个函数重载的决策过程,因此也不存在任何问题。</p>
<p>这里给出一些相关的资料供大家自行思考。<br />
<a href="https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#t144-dont-specialize-function-templates">C++核心准则: T.144: Don’t specialize function templates</a><br />
<a href="http://www.gotw.ca/publications/mill17.htm">Herb Sutter: Why Not Specialize Function Templates?</a><br />
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2307.pdf">A draft proposal Proposed Wording for Concepts.</a> 14.5.6.1小节<br />
<a href="https://stackoverflow.com/questions/5101516/why-function-template-cannot-be-partially-specialized">Stack Overflows: Why function template cannot be partially specialized?</a><br />
<a href="https://stackoverflow.com/questions/3716799/partial-specialization-of-function-templates">Stack Overflows: Partial specialization of function templates</a></p>C++标准并不支持函数模板偏特化,然而在实际开发中,我们确实需要对一些函数模板进行偏特化。本文将介绍几种函数模板偏特化的通用方案。使用asio实现RepeatedTimer定时器2020-09-02T00:00:00+00:002020-09-02T00:00:00+00:00https://jovany-wang.github.io/2020/09/02/implement-repeated-timer-in-asio<p class="intro">使用过boost::asio的同学都知道,asio中的steady_timer是一个较为简陋的组件,其可以提供一个异步等待超时的机制,并且其异步等待是一次性的。这就意味着你想要一个和闹钟一样的定时器,每隔固定时间就滴答一次是需要做不少额外的工作。这篇文章带大家使用boost::asio中的steady_timer实现一个RepeatedTimer。</p>
<h2 id="1-boostasio中的steady_timer">1. boost::asio中的steady_timer</h2>
<p>如果我们想做一次超时的定时器,使用steady_timer写如下代码即可:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">asio</span><span class="o">::</span><span class="n">io_service</span> <span class="n">io_service</span><span class="p">;</span>
<span class="n">asio</span><span class="o">::</span><span class="n">steady_timer</span> <span class="p">{</span><span class="n">io_service</span><span class="p">};</span>
<span class="n">steady_timer</span><span class="p">.</span><span class="n">expires_from_now</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="p">(</span><span class="mi">5</span> <span class="o">*</span> <span class="mi">1000</span><span class="p">));</span>
<span class="n">steady_timer</span><span class="p">.</span><span class="n">async_wait</span><span class="p">([](</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">value</span><span class="p">()</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Time is up!"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="n">io_service</span><span class="p">.</span><span class="n">run</span><span class="p">();</span>
</code></pre></div></div>
<p>上述代码将输出:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>qingw <span class="o">></span> Time is up!
</code></pre></div></div>
<p>可以看出其在5s之后将输出一行文字,随后程序结束。
上述代码问题比较明显:<br />
(1) 无法持续不断的进行超时回调,就是每个5s都调一次回调。<br />
(2) 无法进行<code class="language-plaintext highlighter-rouge">续租约式</code>的对timer进行续租。就是说在5s超时时间还没到的时候按下<code class="language-plaintext highlighter-rouge">reset()</code>按钮,timer就重新开始计时。</p>
<p>为了解决这个问题,提供一个便利高效易用的timer给用户,我们可以对其进行如下封装,实现一个<code class="language-plaintext highlighter-rouge">RepeatedTimer</code>。</p>
<h2 id="2-实现一个现代化的repeatedtimer">2. 实现一个现代化的RepeatedTimer</h2>
<p>首先说一下我们对这个类期望的使用方式应该是这样的。</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">asio</span><span class="o">::</span><span class="n">io_service</span> <span class="n">io_service</span><span class="p">;</span>
<span class="n">asio</span><span class="o">::</span><span class="n">io_service</span><span class="o">::</span><span class="n">work</span> <span class="n">work</span> <span class="p">{</span><span class="n">io_service</span><span class="p">};</span> <span class="c1">// work是为了io_service在没有pending task的时候不退出</span>
<span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="n">th</span> <span class="p">{</span> <span class="n">io_service</span><span class="p">.</span><span class="n">run</span><span class="p">();</span> <span class="p">};</span>
<span class="n">RepeatedTimer</span> <span class="nf">timer</span><span class="p">(</span><span class="n">io_service</span><span class="p">,</span> <span class="p">[](</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Time is up!"</span><span class="p">;</span> <span class="p">});</span>
<span class="n">timer</span><span class="p">.</span><span class="n">Start</span><span class="p">(</span><span class="cm">/*ms=*/</span><span class="mi">1000</span><span class="p">);</span>
<span class="n">timer</span><span class="p">.</span><span class="n">Stop</span><span class="p">();</span>
<span class="n">timer</span><span class="p">.</span><span class="n">Reset</span><span class="p">(</span><span class="cm">/*ms=*/</span><span class="mi">2000</span><span class="p">);</span>
</code></pre></div></div>
<h3 id="21-一把梭实现">2.1 一把梭实现</h3>
<p>根据前面总结的几点需求,我们可以直观地进行如下实现。</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">RepeatedTimer</span> <span class="k">final</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="n">RepeatedTimer</span><span class="p">(</span><span class="n">asio</span><span class="o">::</span><span class="n">io_service</span> <span class="o">&</span><span class="n">io_service</span><span class="p">,</span>
<span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span><span class="p">(</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span><span class="o">></span> <span class="n">timeout_handler</span><span class="p">)</span>
<span class="o">:</span> <span class="n">io_service_</span><span class="p">(</span><span class="n">io_service</span><span class="p">),</span> <span class="n">timer_</span><span class="p">(</span><span class="n">io_service_</span><span class="p">),</span>
<span class="n">timeout_handler_</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">timeout_handler</span><span class="p">))</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">Start</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">timeout_ms</span><span class="p">)</span> <span class="p">{</span> <span class="n">Reset</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">);</span> <span class="p">}</span>
<span class="kt">void</span> <span class="n">Stop</span><span class="p">()</span> <span class="p">{</span> <span class="n">is_running_</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">void</span> <span class="n">Reset</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">timeout_ms</span><span class="p">)</span> <span class="p">{</span>
<span class="n">is_running_</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="n">DoSetExpired</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">);</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="kt">void</span> <span class="n">DoSetExpired</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">timeout_ms</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">is_running_</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span>
<span class="n">timer_</span><span class="p">.</span><span class="n">expires_from_now</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">));</span>
<span class="n">timer_</span><span class="p">.</span><span class="n">async_wait</span><span class="p">([</span><span class="k">this</span><span class="p">,</span> <span class="n">timeout_ms</span><span class="p">](</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">value</span><span class="p">()</span> <span class="o">==</span> <span class="n">asio</span><span class="o">::</span><span class="n">error</span><span class="o">::</span><span class="n">operation_aborted</span> <span class="o">||</span> <span class="o">!</span><span class="n">is_running_</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span>
<span class="n">timeout_handler_</span><span class="p">(</span><span class="n">e</span><span class="p">);</span>
<span class="k">this</span><span class="o">-></span><span class="n">DoSetExpired</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="c1">// The io service that runs this timer.</span>
<span class="n">asio</span><span class="o">::</span><span class="n">io_service</span> <span class="o">&</span><span class="n">io_service_</span><span class="p">;</span>
<span class="c1">// The actual boost timer.</span>
<span class="n">asio</span><span class="o">::</span><span class="n">steady_timer</span> <span class="n">timer_</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">is_running_</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="c1">// The handler that will be triggered once the time's up.</span>
<span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span><span class="p">(</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span><span class="o">></span> <span class="n">timeout_handler_</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<p>测试代码:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 准备工作</span>
<span class="n">asio</span><span class="o">::</span><span class="n">io_service</span> <span class="n">io_service</span><span class="p">;</span>
<span class="n">asio</span><span class="o">::</span><span class="n">io_service</span><span class="o">::</span><span class="n">work</span> <span class="n">work</span> <span class="p">{</span><span class="n">io_service</span><span class="p">};</span>
<span class="n">io_service</span><span class="p">.</span><span class="n">run</span><span class="p">();</span>
<span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="n">th</span><span class="p">{</span> <span class="n">io_service</span><span class="p">.</span><span class="n">run</span><span class="p">();</span> <span class="p">};</span>
<span class="c1">// 使用RepeatedTimer</span>
<span class="n">RepeatedTimer</span> <span class="nf">timer</span><span class="p">(</span><span class="n">io_service</span><span class="p">,</span> <span class="p">[]()</span> <span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"Time is up!"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="p">});</span>
<span class="n">timer</span><span class="p">.</span><span class="n">Start</span><span class="p">(</span><span class="cm">/*ms=*/</span><span class="mi">1000</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">this_thread</span><span class="o">::</span><span class="n">sleep_for</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">millisecond</span><span class="p">(</span><span class="mi">3</span> <span class="o">*</span> <span class="mi">1000</span><span class="p">));</span>
<span class="n">timer</span><span class="p">.</span><span class="n">Stop</span><span class="p">();</span>
<span class="n">timer</span><span class="p">.</span><span class="n">Reset</span><span class="p">(</span><span class="cm">/*ms=*/</span><span class="mi">2000</span><span class="p">);</span>
</code></pre></div></div>
<p>上述测试代码将会先每隔1s输出一个<code class="language-plaintext highlighter-rouge">Time is up!</code>,一共输出3个(也可能是2个, 看具体时间消耗),以后每隔2s输出一句。</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/// 每隔1s输出一句
qingw <span class="o">></span> Time is up!
qingw <span class="o">></span> Time is up!
qingw <span class="o">></span> Time is up!
/// 后面的都是每隔2s输出一行
qingw <span class="o">></span> Time is up!
qingw <span class="o">></span> Time is up!
...
</code></pre></div></div>
<h3 id="22-线程安全">2.2 线程安全</h3>
<p>上述实现的RepeatedTimer并不是一个线程安全的,因为我们自己定义了一些状态来做判断,因此,要想实现一个线程安全的,我们还需要做一些工作。
在这里我选择使用读写锁来作为多线程状态的保护。</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">RepeatedTimer</span> <span class="k">final</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="n">RepeatedTimer</span><span class="p">(</span><span class="n">asio</span><span class="o">::</span><span class="n">io_service</span> <span class="o">&</span><span class="n">io_service</span><span class="p">,</span>
<span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span><span class="p">(</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span><span class="o">></span> <span class="n">timeout_handler</span><span class="p">)</span>
<span class="o">:</span> <span class="n">io_service_</span><span class="p">(</span><span class="n">io_service</span><span class="p">),</span> <span class="n">timer_</span><span class="p">(</span><span class="n">io_service_</span><span class="p">),</span>
<span class="n">timeout_handler_</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">timeout_handler</span><span class="p">))</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">Start</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">timeout_ms</span><span class="p">)</span> <span class="p">{</span> <span class="n">Reset</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">);</span> <span class="p">}</span>
<span class="kt">void</span> <span class="n">Stop</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">unique_lock</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">shared_mutex</span><span class="o">></span> <span class="n">guard</span> <span class="p">{</span><span class="n">shared_mutex_</span><span class="p">};</span>
<span class="n">is_running_</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">Reset</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">timeout_ms</span><span class="p">)</span> <span class="p">{</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">unique_lock</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">shared_mutex</span><span class="o">></span> <span class="n">guard</span> <span class="p">{</span><span class="n">shared_mutex_</span><span class="p">};</span>
<span class="n">is_running_</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">DoSetExpired</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">);</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="kt">void</span> <span class="n">DoSetExpired</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">timeout_ms</span><span class="p">)</span> <span class="p">{</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">shared_lock</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">shared_mutex</span><span class="o">></span> <span class="n">guard</span> <span class="p">{</span><span class="n">shared_mutex_</span><span class="p">};</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">is_running_</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
<span class="n">timer_</span><span class="p">.</span><span class="n">expires_from_now</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">));</span>
<span class="n">timer_</span><span class="p">.</span><span class="n">async_wait</span><span class="p">([</span><span class="k">this</span><span class="p">,</span> <span class="n">timeout_ms</span><span class="p">](</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">value</span><span class="p">()</span> <span class="o">==</span> <span class="n">asio</span><span class="o">::</span><span class="n">error</span><span class="o">::</span><span class="n">operation_aborted</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">shared_lock</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">shared_mutex</span><span class="o">></span> <span class="n">guard</span> <span class="p">{</span><span class="n">shared_mutex_</span><span class="p">};</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">is_running_</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">timeout_handler_</span><span class="p">(</span><span class="n">e</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// 这一步放到锁范围之外,避免递归上锁</span>
<span class="k">this</span><span class="o">-></span><span class="n">DoSetExpired</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="c1">// The io service that runs this timer.</span>
<span class="n">asio</span><span class="o">::</span><span class="n">io_service</span> <span class="o">&</span><span class="n">io_service_</span><span class="p">;</span>
<span class="c1">// The actual boost timer.</span>
<span class="n">asio</span><span class="o">::</span><span class="n">steady_timer</span> <span class="n">timer_</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">shared_mutex</span> <span class="n">shared_mutex_</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">is_running_</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="c1">// The handler that will be triggered once the time's up.</span>
<span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span><span class="p">(</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span><span class="o">></span> <span class="n">timeout_handler_</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<p>上述改进的实现版本能够确保在多线程下使用RepeatedTimer是安全的,因为我们将状态使用读写锁保护了起来。测试代码就不继续写了,也很简单,大家可以自己写一下。</p>
<h3 id="23-死锁分析">2.3 死锁分析</h3>
<p>上述实现虽然是线程安全的,而且我们一眼看过去,是没有死锁的可能的,因为使用的是一个读写锁,并且没有递归上锁。事实上这样的想法是正确的,只是我们在实现一个通用组件的时候,不能仅仅考虑组件本身,还应该考虑用户使用过程中的心智负担以及使用安全性问题。怎么理解这句话呢?其实很简单,说白了就是用户不管怎么用你这个组件,都不应该出现问题!</p>
<p>考虑用户在多线程环境中使用RepeatedTimer,他很容易地就会写出如下代码:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MyClass</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="k">explicit</span> <span class="n">MyClass</span><span class="p">(</span><span class="n">asio</span><span class="o">::</span><span class="n">io_service</span> <span class="o">&</span><span class="n">io_service</span><span class="p">)</span>
<span class="o">:</span> <span class="n">timer_</span><span class="p">(</span><span class="n">io_service</span><span class="p">,</span> <span class="p">[</span><span class="k">this</span><span class="p">](){</span> <span class="n">TimeoutHandler</span><span class="p">();</span> <span class="p">})</span> <span class="p">{</span>
<span class="n">timer_</span><span class="p">.</span><span class="n">Start</span><span class="p">();</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">TimeoutHandler</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">guard</span> <span class="p">{</span><span class="n">mutex_</span><span class="p">};</span>
<span class="n">state_</span><span class="p">.</span><span class="n">ChangeSth</span><span class="p">();</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">F</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">guard</span> <span class="p">{</span><span class="n">mutex_</span><span class="p">};</span>
<span class="n">state_</span><span class="p">.</span><span class="n">ChangeSth</span><span class="p">();</span>
<span class="n">timer_</span><span class="p">.</span><span class="n">Reset</span><span class="p">(</span><span class="mi">1000</span><span class="p">);</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="n">RepeatedTimer</span> <span class="n">timer_</span><span class="p">;</span>
<span class="c1">// The mutex that protects state_.</span>
<span class="n">std</span><span class="o">::</span><span class="n">mutex</span> <span class="n">mutex_</span><span class="p">;</span>
<span class="n">MyState</span> <span class="n">state_</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<p>我们简单的测试这段代码之后,会发现这段代码会有概率发生hang住的问题,使用pstack或者其他debug工具可以发现其发生死锁。
<img src="/assets/img/for_posts/20190104/dead_lock_anal.jpeg" alt="死锁流程" />
线程A, B分别按照时间顺序执行到第3, 第6步之后,紧接着A,B线程分别acquire shared_mutex_和mutex_,但这两者都被对方线程锁住,因此造成死锁。</p>
<p>解决方法也很简单,就是不要在<code class="language-plaintext highlighter-rouge">DoSetExpired()</code>中shared_mutex_的锁范围内调用<code class="language-plaintext highlighter-rouge">timeout_handler_()</code>,shared_mutex_只作用于更改timer的状态即可。改进后的代码如下,只需要简单更改<code class="language-plaintext highlighter-rouge">DoSetExpired()</code>方法中的锁范围即可。</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">RepeatedTimer</span> <span class="k">final</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="c1">/// 其他方法没有改变</span>
<span class="nl">private:</span>
<span class="kt">void</span> <span class="n">DoSetExpired</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">timeout_ms</span><span class="p">)</span> <span class="p">{</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">shared_lock</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">shared_mutex</span><span class="o">></span> <span class="n">guard</span> <span class="p">{</span><span class="n">shared_mutex_</span><span class="p">};</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">is_running_</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
<span class="n">timer_</span><span class="p">.</span><span class="n">expires_from_now</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">));</span>
<span class="n">timer_</span><span class="p">.</span><span class="n">async_wait</span><span class="p">([</span><span class="k">this</span><span class="p">,</span> <span class="n">timeout_ms</span><span class="p">](</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">value</span><span class="p">()</span> <span class="o">==</span> <span class="n">asio</span><span class="o">::</span><span class="n">error</span><span class="o">::</span><span class="n">operation_aborted</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span>
<span class="p">{</span>
<span class="c1">// 注意,这里的锁范围缩小,只作用于更改自身的状态,timeout_handler_()在锁范围之外</span>
<span class="n">std</span><span class="o">::</span><span class="n">shared_lock</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">shared_mutex</span><span class="o">></span> <span class="n">guard</span> <span class="p">{</span><span class="n">shared_mutex_</span><span class="p">};</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">is_running_</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
<span class="n">timeout_handler_</span><span class="p">(</span><span class="n">e</span><span class="p">);</span>
<span class="k">this</span><span class="o">-></span><span class="n">DoSetExpired</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>
<p>OK, 上述的实现就可以在多线程下安全地使用RepeatedTimer了,而且不论用户代码怎么写都不会和timer发生死锁。测试的代码就不继续写了,也很简单,大家可以自己写一下。其中用到的读写锁其实必要性不是非常大,因为读写锁还是针对于多读的场景,然而在RepeatedTimer中,只有<code class="language-plaintext highlighter-rouge">DoSetExpired</code>中才会是读状态,而<code class="language-plaintext highlighter-rouge">DoSetExpired</code>里读状态的地方,只有在极少数的情况下才会多读,比如线程A调用<code class="language-plaintext highlighter-rouge">Reset()</code>刚好执行到<code class="language-plaintext highlighter-rouge">DoSetExpired</code>最开始读状态的地方,线程B刚好是io_service进行超时回调刚到进入<code class="language-plaintext highlighter-rouge">DoSetExpired</code>中,这个时候才恰好是多读。所以这里到底是否需要使用读写锁,在没有做一些很深刻的性能测试之前,其实还只能是根据经验判断。但不管怎么样,都没什么大问题。</p>
<h3 id="24-原子变量实现">2.4 原子变量实现</h3>
<p>用锁实现虽然能有效的解决多线程安全问题,但是在上述思考及实现过程中,还是有很多地方需要大家非常小心,才能避免掉进坑里。上述实现还有一个特点,就是我们其实只需要在多线程的情况下小心的保护<code class="language-plaintext highlighter-rouge">is_running_</code>这一个变量,而不是很多复杂的状态,因此,原子变量就是为这种场景而生的。</p>
<p>使用原子变量,<code class="language-plaintext highlighter-rouge">RepeatedTimer</code>的实现变得异常简单。</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">RepeatedTimer</span> <span class="k">final</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="n">RepeatedTimer</span><span class="p">(</span><span class="n">asio</span><span class="o">::</span><span class="n">io_service</span> <span class="o">&</span><span class="n">io_service</span><span class="p">,</span>
<span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span><span class="p">(</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span><span class="o">></span> <span class="n">timeout_handler</span><span class="p">)</span>
<span class="o">:</span> <span class="n">io_service_</span><span class="p">(</span><span class="n">io_service</span><span class="p">),</span> <span class="n">timer_</span><span class="p">(</span><span class="n">io_service_</span><span class="p">),</span>
<span class="n">timeout_handler_</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">timeout_handler</span><span class="p">))</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">Start</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">timeout_ms</span><span class="p">)</span> <span class="p">{</span> <span class="n">Reset</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">);</span> <span class="p">}</span>
<span class="kt">void</span> <span class="n">Stop</span><span class="p">()</span> <span class="p">{</span> <span class="n">is_running_</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="nb">false</span><span class="p">);</span> <span class="p">}</span>
<span class="kt">void</span> <span class="n">Reset</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">timeout_ms</span><span class="p">)</span> <span class="p">{</span>
<span class="n">is_running_</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="nb">true</span><span class="p">);</span>
<span class="n">DoSetExpired</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">);</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="kt">void</span> <span class="n">DoSetExpired</span><span class="p">(</span><span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">timeout_ms</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">is_running_</span><span class="p">.</span><span class="n">load</span><span class="p">())</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span>
<span class="n">timer_</span><span class="p">.</span><span class="n">expires_from_now</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">milliseconds</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">));</span>
<span class="n">timer_</span><span class="p">.</span><span class="n">async_wait</span><span class="p">([</span><span class="k">this</span><span class="p">,</span> <span class="n">timeout_ms</span><span class="p">](</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">value</span><span class="p">()</span> <span class="o">==</span> <span class="n">asio</span><span class="o">::</span><span class="n">error</span><span class="o">::</span><span class="n">operation_aborted</span> <span class="o">||</span> <span class="o">!</span><span class="n">is_running_</span><span class="p">.</span><span class="n">load</span><span class="p">())</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span>
<span class="n">timeout_handler_</span><span class="p">(</span><span class="n">e</span><span class="p">);</span>
<span class="k">this</span><span class="o">-></span><span class="n">DoSetExpired</span><span class="p">(</span><span class="n">timeout_ms</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="c1">// The io service that runs this timer.</span>
<span class="n">asio</span><span class="o">::</span><span class="n">io_service</span> <span class="o">&</span><span class="n">io_service_</span><span class="p">;</span>
<span class="c1">// The actual boost timer.</span>
<span class="n">asio</span><span class="o">::</span><span class="n">steady_timer</span> <span class="n">timer_</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">bool</span><span class="o">></span> <span class="n">is_running_</span> <span class="o">=</span> <span class="p">{</span> <span class="nb">false</span> <span class="p">};</span>
<span class="c1">// The handler that will be triggered once the time's up.</span>
<span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span><span class="p">(</span><span class="k">const</span> <span class="n">asio</span><span class="o">::</span><span class="n">error_code</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span><span class="o">></span> <span class="n">timeout_handler_</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<p>搞定,上述的原子实现也非常简单,和最开始我们实现的版本并无逻辑区别,它保证了多线程场景下使用的安全性,并且性能还比加锁版本要好很多。</p>
<h2 id="3-总结与思考">3. 总结与思考</h2>
<p>(1) <code class="language-plaintext highlighter-rouge">RepeatedTimer</code>这样的类是一个非常有用且用途非常广泛的类。例如在常见的一些RAFT实现里,需要用到RepeatedTimer作为其ElectionTimer, VoteTimer及HeartbeatTimer等。这些tiemr都需要<code class="language-plaintext highlighter-rouge">Reset()</code>的续租能力,还有很多分布式系统中的一些心跳时间,续租时间等都可以使用这样的类。</p>
<p>(2) 这个类是一定需要考虑线程安全的问题。原因在于其基于asio的staedy_timer,这里的回调都是在io_service中完成,其大概率是在io_service的work pool中进行回调,然后用户的线程是一定会大量对其进行操作,因而没有什么单线程的使用场景。</p>
<p>(3) 使用锁的时候需要考虑的不仅仅是一个组件本身的死锁可能性,还需要尽可能避免同业务代码结合的时候的死锁问题。</p>
<p>(4) 原子变量使用的场景一般是对单状态的保护。 而锁的使用场景则是对很多复杂的状态的保护。因为这个时候我们即使对每个状态都使用原子变量,我们还需要考虑这些状态之间的协调性,这意味着很多状态之间是有状态互斥或者状态协同的。</p>使用过boost::asio的同学都知道,asio中的steady_timer是一个较为简陋的组件,其可以提供一个异步等待超时的机制,并且其异步等待是一次性的。这就意味着你想要一个和闹钟一样的定时器,每隔固定时间就滴答一次是需要做不少额外的工作。这篇文章带大家使用boost::asio中的steady_timer实现一个RepeatedTimer。现代C++错误码之经验谈2020-08-18T00:00:00+00:002020-08-18T00:00:00+00:00https://jovany-wang.github.io/2020/08/18/status-code-in-modern-cpp<p>近几年接触了一些C++大型项目,其中不乏有很优秀的代码,但也有不少代码在实现的时候缺少极致深度思考,导致代码的安全性,可读性和复杂性都有很大的影响。今天和大家讨论一下一个大型系统中,返回错误码的问题。</p>
<h2 id="一-问题">一 问题</h2>
<p>不少大型项目为了给用户提供易用的接口,所以设计的一套API都是以返回固定的error code(有的系统也叫status code)作为返回值,例如:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// error_code.h</span>
<span class="c1">// 需要注意的是,这个错误码是整个系统都使用这一个类型。</span>
<span class="k">enum</span> <span class="n">ErrorCode</span> <span class="p">{</span>
<span class="c1">// 更多的OK_XXX错误码</span>
<span class="n">OK_XXX</span> <span class="o">=</span> <span class="mi">103</span><span class="p">;</span>
<span class="n">OK_RPC_FAILED</span> <span class="o">=</span> <span class="mi">102</span><span class="p">;</span>
<span class="n">OK_KYEY_NOT_FOUND</span> <span class="o">=</span> <span class="mi">101</span><span class="p">;</span>
<span class="n">OK</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="c1">// 网络相关错误码</span>
<span class="n">NETWORKING_FAILURE</span> <span class="o">=</span> <span class="o">-</span><span class="mi">101</span><span class="p">,</span>
<span class="n">HOST_ERROR</span> <span class="o">=</span> <span class="o">-</span><span class="mi">102</span><span class="p">,</span>
<span class="n">DNS_ERROR</span> <span class="o">=</span> <span class="o">-</span><span class="mi">103</span><span class="p">,</span>
<span class="c1">// 机器负载相关错误码</span>
<span class="n">DISK_ERROR</span> <span class="o">=</span> <span class="o">-</span><span class="mi">201</span><span class="p">;</span>
<span class="n">MEMORY_FULL</span> <span class="o">=</span> <span class="o">-</span><span class="mi">202</span><span class="p">;</span>
<span class="n">XXXX</span> <span class="o">=</span> <span class="o">-</span><span class="mi">203</span><span class="p">;</span>
<span class="n">XXXX</span> <span class="o">=</span> <span class="o">-</span><span class="mi">204</span><span class="p">;</span>
<span class="c1">// 然后后面可能还有几百条和业务相关的错误码</span>
<span class="p">}</span>
</code></pre></div></div>
<p>然后要求整个项目的绝大多数方法都返回这个错误码,以便于函数调用过程中一层层往上直接返回。假设我们写了一个简单的util方法split_addr(),这个方法要求我们将一个ip:port格式的字符串split出ip和port两部分。则按照这种错误码要求,我们需要这样实现:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ErrorCode</span> <span class="nf">split_addr</span><span class="p">(</span><span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="o">&</span><span class="n">origin_str</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">pair</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">,</span> <span class="kt">int16_t</span><span class="o">></span> <span class="o">*</span><span class="n">result</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">result</span> <span class="o">==</span> <span class="nb">nullptr</span> <span class="o">||</span> <span class="o">!</span><span class="n">origin_str</span><span class="p">.</span><span class="n">contains</span><span class="p">(</span><span class="s">":"</span><span class="p">))</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">ErrorCode</span><span class="o">::</span><span class="n">SPLIT_INVALID_ARGUMENT</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">const</span> <span class="k">auto</span> <span class="n">split_index</span> <span class="o">=</span> <span class="n">origin_str</span><span class="p">.</span><span class="n">find</span><span class="p">(</span><span class="s">":"</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">split_index</span> <span class="n">out</span> <span class="n">of</span> <span class="n">range</span><span class="p">)</span> <span class="p">{</span><span class="k">return</span> <span class="n">ErrorCode</span><span class="o">::</span><span class="n">SPLIT_ERROR</span><span class="p">;}</span>
<span class="n">result</span><span class="o">-></span><span class="n">first</span> <span class="o">=</span> <span class="n">origin_str</span><span class="p">.</span><span class="n">substr</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">split_index</span><span class="p">);</span>
<span class="c1">// 你还可以加一些其他的trim或者bound的检查等</span>
<span class="k">try</span> <span class="p">{</span>
<span class="n">result</span><span class="o">-></span><span class="n">second</span> <span class="o">=</span> <span class="k">static_cast</span><span class="o"><></span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">stoi</span><span class="p">(</span><span class="n">result</span><span class="o">-></span><span class="n">second</span><span class="p">));</span>
<span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="k">const</span> <span class="n">exception</span> <span class="o">&</span><span class="n">e</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// log this error</span>
<span class="k">return</span> <span class="n">ErrorCode</span><span class="o">::</span><span class="n">SPLIT_ERROR</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">ErrorCode</span><span class="o">::</span><span class="n">OK</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// 测试</span>
<span class="n">std</span><span class="o">::</span><span class="n">pair</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">,</span> <span class="kt">int16_t</span><span class="o">></span> <span class="n">ip_and_port</span><span class="p">;</span>
<span class="n">split_addr</span><span class="p">(</span><span class="s">"127.0.0.1:8888"</span><span class="p">,</span> <span class="o">&</span><span class="n">ip_and_port</span><span class="p">);</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="n">ip_and_port</span><span class="p">.</span><span class="n">first</span><span class="p">;</span> <span class="c1">// 输出127.0.0.1</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="n">ip_and_port</span><span class="p">.</span><span class="n">second</span><span class="p">;</span> <span class="c1">// 输出8888</span>
</code></pre></div></div>
<p>到这里,你以为写完就可以跑了吗?当然不是,你还得将你新定义的这几个错误码加入到ErrorCode的枚举里去,怎么加?首先你得浏览前面我们提到的已定义的几百个错误码中是否有你需要的错误码,如果没有,那么很庆幸,你直接将这几个加入到最后:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">enum</span> <span class="n">ErrorCode</span> <span class="p">{</span>
<span class="c1">// 更多的OK_XXX错误码</span>
<span class="n">OK_XXX</span> <span class="o">=</span> <span class="mi">103</span><span class="p">;</span>
<span class="n">OK_RPC_FAILED</span> <span class="o">=</span> <span class="mi">102</span><span class="p">;</span>
<span class="n">OK_KYEY_NOT_FOUND</span> <span class="o">=</span> <span class="mi">101</span><span class="p">;</span>
<span class="n">OK</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="c1">// 网络相关错误码</span>
<span class="n">NETWORKING_FAILURE</span> <span class="o">=</span> <span class="o">-</span><span class="mi">101</span><span class="p">,</span>
<span class="n">HOST_ERROR</span> <span class="o">=</span> <span class="o">-</span><span class="mi">102</span><span class="p">,</span>
<span class="n">DNS_ERROR</span> <span class="o">=</span> <span class="o">-</span><span class="mi">103</span><span class="p">,</span>
<span class="c1">// 机器负载相关错误码</span>
<span class="n">DISK_ERROR</span> <span class="o">=</span> <span class="o">-</span><span class="mi">201</span><span class="p">;</span>
<span class="n">MEMORY_FULL</span> <span class="o">=</span> <span class="o">-</span><span class="mi">202</span><span class="p">;</span>
<span class="n">XXXX</span> <span class="o">=</span> <span class="o">-</span><span class="mi">203</span><span class="p">;</span>
<span class="n">XXXX</span> <span class="o">=</span> <span class="o">-</span><span class="mi">204</span><span class="p">;</span>
<span class="c1">// 然后后面可能还有几百条和业务相关的错误码</span>
<span class="c1">// 加到了这里</span>
<span class="n">SPLIT_ERROR</span> <span class="o">=</span> <span class="mi">701</span><span class="p">,</span>
<span class="n">SPLIT_INVALID_ARGUMENT</span> <span class="o">=</span> <span class="mi">702</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>
<p>OK,大功告成,你的代码成功的跑起来了!</p>
<p>接着我们来回顾刚才的开发过程,想想其中存在哪些不安全行为。在开发这样一个小小的split helper的过程中你会经历好几次危险的过程,稍有不慎,你将你的代码置于危险之地并在线上环境跑起来。
简单列举一下其中的一些可能存在问题的地方:</p>
<ol>
<li>在将我们新增的错误码添加到ErrorCode里的过程,我们需要仔细检查其中是否有可能适合我们的错误码。这个过程的问题在于有些错误码的定义你可能知道其不能代表我新增的错误码,但是有些你不太好区分(虽然我们可以按类别将同一类型的错误码放在一块,但有些毕竟不相同),例如INVALID_ARGUMENT,NULLPTR_ARGUMENT等等各式各样的错误码定义,因为这些错误码都是不同程序员写的,很难揣摩其他人的意图。简而言之就是你并不知道(或者很难确定)现有的代码里的那些错误码是否合适你。</li>
<li>在此之后,模块B的同学可能需要在模块B中新加一个split的方法用于split其自己的一个以下划线为分隔符的字符串。那么这个同学的开发过程,可能会变得较为艰苦,因为方法里有一些SPLIT相关的错误码,他需要仔细甄别这些错误码能否拿来用,也许有时候有部分可能拿来。(如果不太理解这一点的话,看完下面的例子就清楚。)</li>
<li>ErrorCode这个错误码本来是给用户提供的接口所使用的错误码,其更改实际上是对用户的语义也造成了更改。即使你用注释告诉用户只能用OK和不OK两个状态,但是在编译器角度,你没办法区别开,这就是所谓的编译安全性。用户或者我们自己的其他开发者如果一不小心写出了一个这样的代码,在编译器角度是安全的,可以编译通过,但很显然这样的代码写出来是会对线上造成巨大隐患:</li>
</ol>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// error.h</span>
<span class="c1">// 注意,这里是我们全局唯一的一个ErrorCode类型,其包含系统所以错误码</span>
<span class="k">enum</span> <span class="n">ErrorCode</span> <span class="p">{</span>
<span class="c1">// 这里已经有四百个错误码枚举值</span>
<span class="p">}</span>
</code></pre></div></div>
<p>接着,A同学开发了split_x()方法,按照上述开发逻辑,我们需要做如下事情:
(1) 增加A模块的split_x方法所需要的错误码到全局的error.h中</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// error.h</span>
<span class="c1">// 注意,这里是我们全局唯一的一个ErrorCode类型,其包含系统所以错误码</span>
<span class="k">enum</span> <span class="n">ErrorCode</span> <span class="p">{</span>
<span class="c1">// 这里已经有四百个错误码枚举值</span>
<span class="c1">// A同学在模块A中给split_x增加了这个错误码</span>
<span class="n">ERROR_SPLIT_INVALID_ARGUMENT</span> <span class="o">=</span> <span class="mi">701</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>(2) A同学开发split_x方法</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// A模块中</span>
<span class="n">ErrorCode</span> <span class="nf">spilit_x</span><span class="p">(</span><span class="n">xxx</span><span class="p">,</span> <span class="n">yyy</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="k">do</span> <span class="n">sth</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">ErrorCode</span><span class="o">::</span><span class="n">ERROR_SPLIT_INVALID_ARGUMENT</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// 也许还有可能返回其他几个错误码,但不重要</span>
<span class="k">return</span> <span class="n">ErrorCode</span><span class="o">::</span><span class="n">OK</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>由于大家局部的状态码都是往error.h中添加,那么数个月后,<code class="language-plaintext highlighter-rouge">error.h</code>可能又增加了10个错误码,可能长成这样子了:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// error.h</span>
<span class="c1">// 注意,这里是我们全局唯一的一个ErrorCode类型,其包含系统所以错误码</span>
<span class="k">enum</span> <span class="n">ErrorCode</span> <span class="p">{</span>
<span class="c1">// 这里已经有四百个错误码枚举值</span>
<span class="c1">// A同学在模块A中给split_x增加了这个错误码</span>
<span class="n">ERROR_SPLIT_INVALID_ARGUMENT</span> <span class="o">=</span> <span class="mi">701</span><span class="p">;</span>
<span class="c1">// 这里又有了10个错误码</span>
<span class="p">}</span>
</code></pre></div></div>
<p>然后,B同学在B模块需要开发一个他自己的helper method,也是和split相关的,但不完全一样,那么他的开发过程也和A同学类似:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// B模块中</span>
<span class="n">ErrorCode</span> <span class="nf">split_y</span><span class="p">(</span><span class="n">xxx</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">xxx</span> <span class="o">!=</span> <span class="nb">nullptr</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">ErrorCode</span><span class="o">::</span><span class="n">NULLPTR_ERROR</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">ErrorCode</span><span class="o">::</span><span class="n">OK</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>接着B同学需要做2件事情:
第一,查阅<code class="language-plaintext highlighter-rouge">error.h</code>文件,确认<code class="language-plaintext highlighter-rouge">split_y()</code>方法中返回的错误码是否已在<code class="language-plaintext highlighter-rouge">error.h</code>中定义,如果全部都定义了,则不再新增新的错误码。那么这个过程就是上面第2点提到的,B同学则需要小心甄别这些错误码,因为他不知道A同学给一些错误码的命名方式是和自己的思维一样,例如B需要一个<code class="language-plaintext highlighter-rouge">ERROR_SPLIT_NULLPTR</code>的错误码,但是错误码中没有<code class="language-plaintext highlighter-rouge">ERROR_SPLIT_NULLPTR</code>,不过却有<code class="language-plaintext highlighter-rouge">ERROR_SPLIT_INVALID_ARGUMENT</code>,接着B同学需要去查找这个错误码使用的地方,即去查看A同学写的split_x方法如何用这个错误码的,是否可以用<code class="language-plaintext highlighter-rouge">ERROR_SPLIT_INVALID_ARGUMENT</code>来表示自己的参数是个nullptr的情况。说到这里,我们为B同学感到丝丝的担忧,因为他太难了。其实对B同学而言,开发B模块的split_y,本没有职责去,也不应该去看A模块中的<code class="language-plaintext highlighter-rouge">split_x</code>方法的具体实现。
第二, 他通过自己仔细的甄别,认为还是需要加一个<code class="language-plaintext highlighter-rouge">ERROR_SPLIT_NULLPTR</code>的错误码,反正到这里,对B而言终于解脱了:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// error.h</span>
<span class="c1">// 注意,这里是我们全局唯一的一个ErrorCode类型,其包含系统所以错误码</span>
<span class="k">enum</span> <span class="n">ErrorCode</span> <span class="p">{</span>
<span class="c1">// 这里已经有四百个错误码枚举值</span>
<span class="c1">// A同学在模块A中给split_x增加了这个错误码</span>
<span class="n">ERROR_SPLIT_INVALID_ARGUMENT</span> <span class="o">=</span> <span class="mi">701</span><span class="p">;</span>
<span class="c1">// 这里又有了10个错误码</span>
<span class="n">ERROR_SPLIT_NULLPTR</span> <span class="o">=</span> <span class="mi">801</span><span class="p">;</span> <span class="c1">// B同学新增的错误码,也可能他还增加了其他错误码</span>
<span class="p">}</span>
</code></pre></div></div>
<p>到这里,大家还记得代码库里长什么样吗?我简单描述一下,<code class="language-plaintext highlighter-rouge">error.h</code>有几百个错误码,其中有个<code class="language-plaintext highlighter-rouge">ERROR_SPLIT_INVALID_ARGUMENT</code>是A同学给<code class="language-plaintext highlighter-rouge">split_x()</code>方法返回用的,还有个<code class="language-plaintext highlighter-rouge">ERROR_SPLIT_NULLPTR</code>是B同学给B模块中的<code class="language-plaintext highlighter-rouge">split_y()</code>用的。嗯,接着C同学可能在某时刻B模块中使用<code class="language-plaintext highlighter-rouge">split_y()</code>方法,但是C同学真的不够聪明,写出了如下代码:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 代码片段c</span>
<span class="kt">void</span> <span class="nf">f</span><span class="p">()</span> <span class="p">{</span>
<span class="c1">// do buiness</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ErrorCode</span><span class="o">::</span><span class="n">ERROR_SPLIT_INVALID_ARGUMENT</span> <span class="o">!=</span> <span class="n">split_y</span><span class="p">(</span><span class="n">result</span><span class="p">))</span> <span class="p">{</span>
<span class="c1">// 处理错误</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="c1">// 正常流程,做一些事情, 使用result</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>如果result在经历九九八十一难之后到达这里成为一个nullptr之后,可怕的后果即将发生,轻则在else处使用result,导致程序core dump,重则esle处没有使用result,而是把result再经过山路十八弯传到你都不知道传到的那个地方去了,导致最终线上出现问题,你都很难排查得出最原始的出错位置。这里出现问题的原因是在于<code class="language-plaintext highlighter-rouge">split_y</code>对空指针返回的是<code class="language-plaintext highlighter-rouge">ERROR_SPLIT_NULLPTR</code>,而不是<code class="language-plaintext highlighter-rouge">ERROR_SPLIT_INVALID_ARGUMENT</code>,所以会走进else分支而出问题。</p>
<p>为什么这种情况很难避免呢?因为我们要想避免C同学的这部分代码合入到你代码库中,只有2种方法,<strong>单元测试和code review</strong>。 而C同学的这段错误代码是很难被这两种方式检查出来的,为什么呢?首先说单元测试,业务逻辑往往很复杂,C++也不是一个内存安全的语言,因此在<code class="language-plaintext highlighter-rouge">if()</code>之前你很难知道你的unit test怎么去构造才会使得result为空。因为f是业务逻辑,不是单独的一个方法,C同学的单测只能覆盖到这些业务逻辑层的各种case,这里的case很显然不容易覆盖到。其次对于<code class="language-plaintext highlighter-rouge">split_y()</code>的测试,是B同学写的<code class="language-plaintext highlighter-rouge">split_y()</code>的单元测试,B同学写的这个测试很显然能够正确运行,因为他一定只有返回了ERROR_SPLIT_NULLPTR他的测试才能通过,但这对于C同学而言一无所知。再来说code review,对于reviewer而言,看到这里代码,他如果不非常熟悉error code里的几百个枚举值分别表示什么,他就很难catch住这个if()的错误。并且reviewer在看到这个if()的第一直觉也就是下意识的认为很棒,这个if()做了一次错误码判断。(主要原因还是在于,reviewer只能检查代码的逻辑正确和结构正确,实际运行起来的细节根本cover不住。)嗯,故事差不多就是这个样子。</p>
<p>很显然,造成上述问题的原因可能是多种的,what ever,作为一门极度危险语言C++的开发者(这里的危险是指你要处处小心),从根源上杜绝不安全行为是基本准则。</p>
<p>在介绍如何使用安全性高的返回值之前,我再次强调一下编译安全这个概念。我们不能假定任何其他开发者(其实也包括我们自己)都有着无穷的智慧可以小心翼翼地处理任何细节,而是要假定其他开发者都是一个“大笨蛋”,甚至说其他开发者就是属于“杠精”,专门去写出不安全的代码使得程序出现问题,并且要视这种情况为普通发生的情况。那么在对付这样情况的时候,我们就需要使用现代化C++更加安全的技术或者方法论使得“杠精”无处遁形,即写出任何一种可能到处问题的代码的时候,都能在编译阶段让“杠精”编译不通过,这就是所谓的编译安全(题外话:C++20的concepts提案就是专门针对编译期安全性校验和human-readable的编译期报错的一个提案)。</p>
<h2 id="二-原则">二 原则</h2>
<p>现代化C++错误码或者说状态码其实并不涉及Moderen C++的任何新技巧,而只是思想上,战略上,设计上去做好这件事情。</p>
<ul>
<li>原则1:更改跟函数签名有关的任何内容都视为一种API的改动(维护stable API)</li>
<li>原则1.5: “设计”不依赖“实现”,“实现”可以迭代“设计”</li>
<li>原则2: 模块间解耦,确定模块间依赖关系以及模块间的交互语义(即不要一个error code走遍天下)</li>
<li>原则3: 缩小error code的scope,不断迭代各种不同类型的error code</li>
<li>原则4: 最重要的原则:视其他人都是杠精</li>
<li>原则5: 不属于用户感知的错误码,不应当作为API的错误码一部分</li>
<li>原则6: 使用enum class instead of raw enum</li>
</ul>
<h2 id="三-简单改进">三 简单改进</h2>
<p>基于上述原则,我们可以简单实现一套更安全的返回码机制:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">error</span><span class="p">.</span><span class="n">h</span>
<span class="c1">// 抽象用户API,提供一套稳定且安全的API</span>
<span class="c1">// 注意这个返回值是用户代码的返回error code,它应该是需要和API一样的stable, 并不应该随意更改</span>
<span class="c1">// 因为基于原则1, ErrorCode是API方法签名的返回值,它的任何更改都属于API级别的更改,因此在能兼容的情况下</span>
<span class="c1">// 应该是需要通过小版本迭代才能更改,如果新增的一个error code值不能兼容,则是需要通过大版本迭代。</span>
<span class="k">enum</span> <span class="k">class</span> <span class="nc">ErroCode</span> <span class="kt">uint16_t</span> <span class="p">{</span>
<span class="n">OK</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
<span class="n">X_NOT_FOUND</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
<span class="n">B_IS_FULL</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span>
<span class="n">XXXX</span>
<span class="p">};</span>
<span class="k">class</span> <span class="nc">Error</span> <span class="p">{</span>
<span class="c1">// some useful methods.</span>
<span class="k">static</span> <span class="n">Error</span> <span class="n">ok</span><span class="p">()</span> <span class="p">{</span><span class="k">return</span> <span class="n">Error</span><span class="p">(</span><span class="n">ErrorCode</span><span class="o">::</span><span class="n">OK</span><span class="p">,</span> <span class="n">xxx</span><span class="p">);}</span>
<span class="k">static</span> <span class="n">Error</span> <span class="n">x_not_found</span><span class="p">()</span> <span class="p">{</span><span class="k">return</span> <span class="n">xxxx</span><span class="p">;}</span>
<span class="c1">// ...</span>
<span class="kt">bool</span> <span class="n">is_ok</span><span class="p">()</span> <span class="p">{</span><span class="n">xxxx</span><span class="p">}</span>
<span class="c1">// ...</span>
<span class="nl">private:</span>
<span class="n">ErrorCode</span> <span class="n">code</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">message</span><span class="p">;</span>
<span class="c1">// 这里可以再加一个detail代理以便于将子模块的error code作为error的一部分来扩展开</span>
<span class="c1">// 这部分后续再单独讲解,这里我们主要讲解error code的安全性问题。</span>
<span class="p">};</span>
</code></pre></div></div>
<p>其次在不同的模块间,通过依赖关系,先确定模块依赖和交互关系,再基于此考虑如何定义模块间的错误码(也可以不需要错误码)。这部分由于和业务有着一定的关系,因此我不太好用代码表述出来,但不管怎样,基本的原则就是先定义模块间的依赖,再定义错误码,并且尽可能不要复用API的错误码,除非完全一样。</p>
<p>其次,再来说下如果我们A, B两位同学实现各自不同的split的同时,他们应该怎么做。这里的做法就可以是千奇百怪,但是要注意一点,就是
split返回的错误码绝不能往API的error code中添加。A同学可以直接返回一个inner error code:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">enum</span> <span class="k">class</span> <span class="nc">ModuleAXXXErrorCode</span> <span class="p">{</span>
<span class="n">SPLIT_INVALID_ARGUMENT</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>
<span class="n">xxxxx</span><span class="p">,</span>
<span class="n">xxxxx</span><span class="p">,</span>
<span class="p">};</span>
<span class="n">ModuleAXXXError</span> <span class="nf">split</span><span class="p">()</span> <span class="p">{</span>
<span class="c1">// ......</span>
<span class="p">}</span>
</code></pre></div></div>
<p>B同学也可以在他的模块中实现一个他自己需要的split helper, 并返回一个自己的inner error code或者返回其他任何他自己认为OK的方式,这个例子里B同学就是返回一个bool来表示split是否成功,这种写法也是完全OK的,并且是编译安全的。至于如果你想更加细粒度的了解split内部到底出现什么错误的时候,那你可以按照自己的粒度去定义它的返回值了,但从一般经验上来讲, bool完全足够。</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">bool</span> <span class="nf">split</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">true</span> <span class="n">or</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>需要了解enum class的可以参考(其实enum class目前而言对解决我们遇到的问题帮助不大):<br />
https://hownot2code.com/2016/06/14/start-using-enum-class-in-your-code-if-possible/<br />
https://en.cppreference.com/w/cpp/language/enum</p>
<p>此外还有一点需要注意的是,我们不可能对任何一个方法都定义一个错误码,这是不应该的,也是不可能的,因此我们也需要掌握模块间的错误码定义,当你认为这是一个非常inner的方法时,至少你不应当扩大其错误码的scope(原则3)。其次,你还需要利用原则1.5和原则2去通过不断迭代,来提升你模块间的错误码的质量,以避免使用更大scope的错误码行为。
此外,我们也在致力于更好的解决这个问题,但由于方案的不够通用性,可能无法在C++语法上得到支持,但我们正在努力尝试去推动这样一个提案给C++标准委员会,虽然大概率不能被accept,因为这个提案是和core language相关,而不是和library相关的。此外,我们还在考虑另外一种不需要进入C++标准的方案,即开发一个hint插件,但这只是一种静态检查,也无法做到非常高的编译安全性,但hint至少能稍微改善这种情况。</p>
<h2 id="四-总结">四 总结</h2>
<p>编写编译安全的C++的代码,是现代化C++开发者的基本素养,我们不仅要认识到不安全的代码究竟是为何不安全的,还需要从一点点的细节上去让我们的代码安全性变得更高。这里需要大家掌握的不是技能,而是对待现代化C++大型系统的开发原则和一丝不苟的开发态度。</p>
<p>后续的文章,我们再来详细叙述上述几种原则的原理性。</p>近几年接触了一些C++大型项目,其中不乏有很优秀的代码,但也有不少代码在实现的时候缺少极致深度思考,导致代码的安全性,可读性和复杂性都有很大的影响。今天和大家讨论一下一个大型系统中,返回错误码的问题。Papers I Have Read2019-11-01T00:00:00+00:002019-11-01T00:00:00+00:00https://jovany-wang.github.io/2019/11/01/distributed_computing_papers<p class="intro"><span class="dropcap">In </span>this post,I will summarize the papers about distributed computing that I have read. On the one hand, this can help me to look some thing up when I need, and on the other hand, I will give some my personal views or summaries on them.</p>
<h3 id="1-dague-a-generic-distributed-dag-engine-for-high-performance-computing">1. DAGuE: A generic distributed DAG engine for high performance computing</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Conf: IPDPS
Date: 2011
Author: George Bosilca AND more
Original: https://jovany.wang
</code></pre></div></div>
<h4 id="abstract">Abstract</h4>
<p>The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures has been a tremendous task for the whole scientific computing community.</p>
<p>We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be represented as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of the framework, and a Linear Algebra factorization as a use case.</p>In this post,I will summarize the papers about distributed computing that I have read. On the one hand, this can help me to look some thing up when I need, and on the other hand, I will give some my personal views or summaries on them.The Dst in My Eyes2019-10-26T00:00:00+00:002019-10-26T00:00:00+00:00https://jovany-wang.github.io/2019/10/26/dst-in-my-eyes<p class="intro"><span class="dropcap">At</span> the beginning of this year(2019), our project in company needed to use Redis as a backend storage, but we required 2 functionalities of it: strong consistency and table concept.</p>
<p>For the duration of it, we have a very strong requirement on strong consistency, while not having so strong requirement on table concept. But we have to write the table-like concept in our buiness if there is not a table concept in Redis. That’s a troublesome thing to me. So the idea <strong>to write a light weight storage with strong consistency and table concept</strong> came out naturally. I had givan a name for it called <a href="https://github.com/dst-project/dst">Dst</a>, which is combined with the first letters of the words <code class="language-plaintext highlighter-rouge">Distributed Store with Table</code>.</p>
<p>The targets of Dst in my mind are:</p>
<ol>
<li>Redis-like APIs: avoid cost to understand.</li>
<li>Trade performance off for greater consistency.</li>
<li>Support simple table concept.</li>
</ol>
<h3 id="redis-like-apis">Redis-like APIs</h3>
<p>It is too natural to use the Redis API, so it is reasonable that Dst uses the Redis-like APIs. If you want to write a value typed as string to Dst store, you can use the following commands:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> str.put <span class="s2">"k1"</span> <span class="s2">"v1"</span>
<span class="o">></span> <span class="s2">"ok"</span>
<span class="o">></span> str.get <span class="s2">"k1"</span>
<span class="o">></span> <span class="s2">"v1"</span>
</code></pre></div></div>
<p>Does those look like Redis commands? But there is a little difference between both of them. Dst command is consists of 2 words, the 1st one is the word <code class="language-plaintext highlighter-rouge">str</code> to indicate the type of the object, followed by the word <code class="language-plaintext highlighter-rouge">put</code> to indicate the operator that you’d like to invoke on the object. It looks very like the OO(Object-Oriented) code that you write. Actually, I want to provide a graceful APIs to make your commands look like code in command line.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> str.put<span class="o">(</span><span class="s2">"k1"</span>, <span class="s2">"v1"</span><span class="o">)</span>
<span class="o">></span> <span class="s2">"ok"</span>
<span class="o">></span> str.get<span class="o">(</span><span class="s2">"k1"</span><span class="o">)</span>
<span class="o">></span> <span class="s2">"v1"</span>
</code></pre></div></div>
<p>Without doubt, the same is true for other type operators:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> dict.put <span class="s2">"dict_1"</span> <span class="s2">"k1"</span> <span class="s2">"v1"</span> <span class="s2">"k2"</span> <span class="s2">"v2"</span>
<span class="o">></span> <span class="s2">"ok"</span>
dict.get <span class="s2">"dict_1"</span>
<span class="o">></span> <span class="o">{</span><span class="s2">"k1"</span> : <span class="s2">"v1"</span> , <span class="s2">"k2"</span> : <span class="s2">"v2"</span><span class="o">}</span>
dict.get <span class="s2">"dict_1"</span> <span class="s2">"k2"</span>
<span class="o">></span> <span class="s2">"v2"</span>
</code></pre></div></div>
<h3 id="strong-consistency">Strong Consistency</h3>
<p>It’s difficult to archive the goal of both making a great strong consistency and having great performance, which is limited by both of the CAP theory and code implementation complexity. So I was considering that wasting a bit little of performance to meet our strong consistency requirements at that time. I could tolerate performing the next actions after a log being synchronized to the most of the cluster nodes. That was the actul plan on strong consistency! But it’s also not easy to archive this goal. It is not likely a single node program with one kv store instance. With this distributed design, the kv store instance is just a FSM in the distributed system, and the FSM is one of the lightest things.</p>
<p>I before imaged the design that every store instance is as a RAFT instance in a RAFT group, and so that the data synchronizing between store instances is using RAFT synchronization. But I thought there would be unnecessary cost by RAFT synchronization. That’s not very perfect to me. Regarding the performance of store server, I prefered using <code class="language-plaintext highlighter-rouge">AppendDataLogs</code> handed by ourselves to using RAFT. And then would use a RAFT group as the coordinators server group to do master-slection, requesting recovrying from failures and managing all the store servers. We called the group of coordinators as meta servers, or meta server group. And the meta servers would use RAFT to gurantee the consistency of meta servers. That wouldn’t effect the performance because meata servers is not used frequently.</p>
<p>Maybe you are so curious about the reason for removing the RAFT and then adding the RAFT back. But it’s not contradictory at all:
(1) We should gurantee the consistency between meta servers.
(2) We needed to synchronize a large of data between store servers, so synchronizing so large data between store servers by RAFT is not allowed here while synchronizing data between meta servers making a lot of senses.</p>
<h3 id="supporting-table-concept">Supporting table concept</h3>
<p>Most guys throught it’s not reasonable to give a table concept for Dst because the usage of it looks very like a relationship database. It allows you define and create a table:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mytables.sc
table task_table <span class="o">{</span>
<span class="o">[</span>p]task_id: string<span class="p">;</span>
<span class="o">[</span>i]driver_id: string<span class="p">;</span>
task_name: string<span class="p">;</span>
return_num: int<span class="p">;</span>
arguments: <span class="o">[</span>string]<span class="p">;</span>
<span class="o">}</span>
table driver_table <span class="o">{</span>
<span class="o">[</span>p]driver_id: string<span class="p">;</span>
driver_name: string<span class="p">;</span>
actor_num: int<span class="p">;</span>
<span class="o">}</span><span class="p">;</span>
<span class="o">></span> table.create task_table driver_table from mytables.sc
<span class="o">></span> <span class="s2">"ok"</span>
</code></pre></div></div>
<p>OK, there is a table object named <code class="language-plaintext highlighter-rouge">task_table</code> lies in the Dst system. And we can use the table with the sql-like operators.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> task_table.add <span class="s2">"00001"</span>, <span class="s2">"22222"</span>, <span class="s2">"my_task"</span>, 3, <span class="o">[</span><span class="s2">"1"</span>, <span class="s2">"2"</span><span class="o">]</span>
<span class="o">></span> <span class="s2">"ok"</span>
<span class="o">></span> task_table.query <span class="o">(</span><span class="k">*</span><span class="o">)</span> when driver_id <span class="o">==</span> <span class="s2">"22222"</span>
<span class="o">></span> task_id driver_id task_name num_return arguments
<span class="o">></span> <span class="s2">"00001"</span> <span class="s2">"22222"</span> <span class="s2">"my_task"</span> 3 <span class="o">[</span><span class="s2">"1"</span>, <span class="s2">"2"</span><span class="o">]</span>
</code></pre></div></div>
<p>Now as you can see above, it’s graceful to do operations on Dst(or Dst table). The original essence of the other types of Dst is a k-v pair. So the table concept is also stored in memory, and this is the biggest difference between Dst and othe relationship datbases even if there some opertions around disk like snapshot and checkpoint. But the operations of your queries are both executed in memory.</p>
<p>There are some words from others like <code class="language-plaintext highlighter-rouge">Is the memory enought if table is too large?</code>. I could see it, but the target of the Dst is just an in-memory k-v storage instead of a real realtionship database. And it aims to address the question: How to use the Redis(or other in memory kvs) with table concept?
As for the question <code class="language-plaintext highlighter-rouge">Is the memory enought if table is too large?</code>, I still don’t know as well.</p>
<p>That is all about Dst, and how do you think of it?</p>At the beginning of this year(2019), our project in company needed to use Redis as a backend storage, but we required 2 functionalities of it: strong consistency and table concept.