现代C++之拷贝与移动

Posted on 2024-05-17 Edited on 2024-09-20

前言

拷贝与移动的知识，是C++有别于C-style编程范式，最核心也是最基础的知识体系。下面将通过大家“闻风丧胆”的左值和右值出发，讲述C++拷贝与移动的知识体系，并结合STL的接口设计说明如何抛弃裸指针，写出更安全，性能更强的代码。

左值与右值

标准组织对C++值类型的分类

我们似乎很难给出最好理解的左值和右值的定义，我们先简单看看标准组织给出的左值、右值分类。
标准组织将值分为以下几种类型：

分为，泛左值（glvalue）和右值(rvalue)两个大类，其中：

泛左值：包含左值（lvalue）和将亡值（xvalue）

右值：包含纯右值（prvalue）和将亡值（xvalue）

你可能已经感到晕头转向，当然本文将不会按照上述的思路来讲述在C++编程中如何处理左值和右值。不过为了对齐概念，本文提到的左值、右值分别对应标准组织定义的左值（lvalue）和纯右值（prvalue），这与大部分文档以及开源社区上的通用习语上是一致的。

我们依旧通过上次大家熟知的Foo对象作为后续课程讲述的案例：

#include <iostream>
#include <memory>

class Foo {
public:
    Foo() {
        std::cout << "Foo construct." << std::endl;
    }

    Foo(const std::string &data) : m_str(data) {
        std::cout << "Foo(" << m_str << ") construct." << std::endl;
    }
    
    ~Foo() {
        std::cout << "Foo(" << m_str << ") desctruct." << std::endl;
    }

    Foo(const Foo &other) {
        std::cout << "Foo construct by lvalue." << std::endl;
        m_str = other.m_str;
    }

    Foo(Foo &&other) {
        std::cout << "Foo construct by rvalue." << std::endl;
        m_str = std::move(other.m_str);
    }

    Foo &operator=(const Foo &other) {
        std::cout << "Foo copy assign." << std::endl;
        if (this != &other) {
            m_str = other.m_str;
        }
        return *this;
    }

    Foo &operator=(Foo &&other) {
        std::cout << "Foo move assign." << std::endl;
        if (this != &other) {
            m_str = std::move(other.m_str);
        }
        return *this;
    }

    std::string m_str = "";
};

我们如何理解左值右值

“左”和“右”之分，通常理解来源于，C++中的赋值操作。例如：

int i;
i = 7;
std::string s;
s = "abc";

左边，也就是 i 和 s 所代表的两个变量，通常认为是左值；而右边 7 和 “abc” 所代表的字面量，通常被认为是右值。这合理解释了，“左”和“右”在位置上的区分。
更具体的讲，左值通常表达了一个被赋值的量，右值表达一个可以给别人赋值的量。

我们接着对两句话做进一步诠释：

左值通常表达一个被赋值的量————这表明左值在内存中有明确的地址
右值通常表达一个可以给别人赋值的量————这表明右值承担“值”的含义，它可以没有内存地址或者说内存地址并不重要

（虽然这个解释还不够完美，但请先理解到这个程度！）

正如大家所知，我们通常用&修饰左值，&&修饰右值，下面是左值、右值更广泛的用法。

左值

#include <iostream>

int a = 1;
// 左值初始化（实际上是传递引用，无真实拷贝动作发生）
int &b = a;
// 我们修改a的值，左值b也会跟着发生改变。
a = 999;
std::cout << b << std::endl;

涉及类的场景：

1
2
3

void SomeFunctionUseLvalue(const Foo &foo) {
    std::cout << "foo is lvalue." << std::endl;
}

1
2
3

void SomeFunctionUseRvalue(Foo &&foo) {
    std::cout << "foo is rvalue." << std::endl;
}

// 涉及类的场景, foo1也是左值
Foo foo1{};
SomeFunctionUseLvalue(foo1);
SomeFunctionUseRvalue(Foo{});

Foo construct.
foo is lvalue.
Foo construct.
foo is rvalue.
Foo() desctruct.

右值

// 右值引用通过常量整形初始化
int &&c = 1;
int a = 1;
int &b = a;
// 右值引用无法通过左值引用初始化
int &&d = b;

[1minput_line_15:7:7: [0m[0;1;31merror: [0m[1mrvalue reference to type 'int' cannot bind to lvalue of type 'int'[0m
int &&d = b;
[0;1;32m      ^   ~
[0m


Interpreter Error:

{
    Foo foo1{"foo1"};
    // std::move可将左值强制转换为右值，同时原对象失效（涉及编译器告警use-after-move）
    Foo movedFoo1 = std::move(foo1);
    // 不具名方式构造的对象默认可作为右值
    Foo &&rvalueFoo = Foo{"foo2"};
    Foo movedFoo2 = rvalueFoo;
}

Foo(foo1) construct.
Foo construct by rvalue.
Foo(foo2) construct.
Foo construct by lvalue.
Foo(foo2) desctruct.
Foo(foo2) desctruct.
Foo(foo1) desctruct.
Foo() desctruct.

编译器支持自动识别左值和右值

1
2
3

void SomeFunction(Foo &foo) {
    std::cout << "this is a lvalue." << std::endl;
}

1
2
3

void SomeFunction(Foo &&foo) {
    std::cout << "this is a rvalue." << std::endl;
}

1
2
3

Foo lFoo{};
SomeFunction(lFoo);
SomeFunction(Foo());

Foo construct.
this is a lvalue.
Foo construct.
this is a rvalue.
Foo() desctruct.

拷贝和移动的本质

叠个buff防杠：

我们经常赞叹于C++对语法糖控制灵活性，以至于我们常常会被C++这种高自由度的设计语言搞的晕头转向。笔者无法承诺“拷贝与移动”这个“本质”话题能够尽善尽美的诠释清楚（拷贝和移动在开源社区里有不同的流派，各位大佬之间争论较为激烈），但尽可能将相对主流的流派（以STL为代表的所有权流派）作为推荐范式给大家学习。

C++的拷贝和移动只是形式化契约

我们知道，C++的赋值符号（=）是可重载操作符，在开发者未定义的情况下，在大部分情况下，C++的编译器会自动为类生成拷贝赋值（copy assignment）、移动赋值（move assignment）的操作符重载实现。编译器的自动赋值重载实现，有较为复杂的规则，通常不建议交给编译器自己实现（后面有单独课程讲编译器做的事情）。

因此，涉及拷贝、移动时，建议由开发者自行实现拷贝和移动赋值重载。这也导致了一些问题，比如，以Foo为例，正常的拷贝和移动实现为：

// 拷贝赋值
Foo &operator=(const Foo &other) {
    std::cout << "Foo copy assign." << std::endl;
    if (this != &other) {
        m_str = other.m_str;
    }
    return *this;
}
// 移动赋值
Foo &operator=(Foo &&other) {
    std::cout << "Foo move assign." << std::endl;
    if (this != &other) {
        m_str = std::move(other.m_str);
    }
    return *this;
}

但编译器对各自在拷贝和移动中做什么，完全不理会，比如你甚至可以干这些事情：

// 把拷贝当移动
Foo &operator=(const Foo &other) {
    std::cout << "Foo copy assign." << std::endl;
    auto &casted = const_cast<Foo &>(other);
    if (this != &other) {
        m_str = std::move(casted.m_str);
    }
    return *this;
}
// 把移动当拷贝
Foo &operator=(Foo &&other) {
    std::cout << "Foo move assign." << std::endl;
    if (this != &other) {
        m_str = other.m_str;
    }
    return *this;
}

这些会让Committer怒摔键盘的代码，编译器完全不会管，没错，就是玩！！！因此，拷贝和移动的设计，通常是在大型开发项目中，要求开发者遵循的君子契约。

所有权流派之拷贝

我们首先观察下拷贝过程发生了什么。

{
    Foo foo1{"foo1"};
    Foo foo2{"foo2"};
    // 调用拷贝构造，与Foo copied(foo1);没有区别
    Foo copied = foo1;
    // 调用拷贝赋值
    copied = foo2;
}

Foo(foo1) construct.
Foo(foo2) construct.
Foo construct by lvalue.
Foo copy assign.
Foo(foo2) desctruct.
Foo(foo2) desctruct.
Foo(foo1) desctruct.

我们观察调用日志，首先可以得出第一个结论：

在变量初始化时拷贝，本质上调用的是拷贝构造函数。

因此我们永远推荐，拷贝构造、拷贝赋值务必成对出现。

同时，通过拷贝构造，Foo实际上出现了新的副本，这三个副本之间相互独立，不会出现相互影响，是逻辑独立的实体，三个对象拥有对自己独立的所有权。

所有权流派之移动

在讲移动之前，我们重新认识下，什么是std::move

通过观察源码发现，std::move本质上只是一个static_cast：(摘自llvm)

template <class _Tp>
inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR
typename remove_reference<_Tp>::type&&
move(_Tp&& __t) _NOEXCEPT
{
    // 去除引用修饰，提取值类型
    typedef _LIBCPP_NODEBUG_TYPE typename remove_reference<_Tp>::type _Up;
    // static_cast为右值引用
    return static_cast<_Up&&>(__t);
}

这也是社区对此颇为吐槽的地方，因为std::move本质上并没有发生与move这个单词语义有关联的地方，对象的内存地址并没有发生改变。只是通过static_cast将左值引用强转为右值引用，从而欺骗编译器使之推导匹配对应的右值引用函数。

我们来看一个实际的例子：

{
    Foo foo{"foo"};
    Foo movedFoo = std::move(foo);
    movedFoo.m_str = "movedFoo";
}

Foo(foo) construct.
Foo construct by rvalue.
Foo(movedFoo) desctruct.
Foo() desctruct.

类似的，通过Foo construct by rvalue日志，我们可知，本质调用的是移动构造函数，我们第一个结论：

在变量初始化时移动，本质上调用的是移动构造函数。

因此我们永远推荐，移动构造、移动赋值务必成对出现。

我们观察到，这个过程发生了两次构造，和两次析构。两次析构，moviedFoo对象是正常析构，但我们同时也观察到一个空的Foo发生了析构。这是因为，我们通过std::move将foo转为右值，所有权转移给了movedFoo，空的Foo是被move后没有价值、不可再访问的对象。这个过程，最终只有一个有效对象，对象所有权从foo转移到movedFoo上。

可见，虽然编译器对std::move的语义模凌两可，但我们仍旧实现了所有权转移。

别急，我们再看一个例子

void SomeMoveFunction(Foo &&foo) {
    Foo movedFoo = foo;
    std::cout << "foo move to temperary scope, and it will destruct." << std::endl;
}

1
2
3

Foo foo{"foo"};

SomeMoveFunction(std::move(foo));

Foo(foo) construct.
Foo construct by lvalue.
foo move to temperary scope, and it will destruct.
Foo(foo) desctruct.

在SomeMoveFunction中，foo所有权被movedFoo捕获，因此原foo对象在SomeMoveFunction这个函数的作用域中析构。因此我们还有一个重要的结论：

所有权发生转移后的变量，其生命周期也由新变量管理

（可选）左右值完美转发

在实际编码过程中，情况可能会比我们上面阐述的更为复杂，比如，我们承担了一个公共的依赖模块开发。公共模块设计的对外接口，强行约束调用方使用只使用左值或者右值，是相当流氓的做法。

在阐述完美转发之前，我们看这样一个例子：

1
2
3

void ImplFunction(const Foo &foo) {
    std::cout << "Call with lvalue." << std::endl;
}

1
2
3

void ImplFunction(Foo &&foo) {
    std::cout << "Call with rvalue." << std::endl;
}

template <typename FooType>
void SomeCommonFunction(FooType &&foo) {
    ImplFunction(foo);
}

很显然，SomeCommonFunction被设计为对外的公共函数，我们知道，函数接受参数有如下规则：

左值引用入参：只能接受左值引用
左值引用入参（const）：左值引用 + 字面量常量
右值引用（万能引用）：以上所有 + 右值引用

我们观察下，这几种场景，编译器实际调用的是哪个ImplFunction。

{
    // call with lvalue
    Foo foo{"foo"};
    SomeCommonFunction(foo);
}

Foo(foo) construct.
Call with lvalue.
Foo(foo) desctruct.

{
    // call with rvalue
    Foo foo{"foo"};
    SomeCommonFunction(std::move(foo));
}

Foo(foo) construct.
Call with lvalue.
Foo(foo) desctruct.

OK, 我们发现了大问题，无论是左值还是右值，都统一推导到左值入参的副本。这是因为通过“万能引用”传递的右值（具名变量），即便被声明为了右值引用，也不会被当作右值：

任何函数内部，即便入参声明为右值，在函数内也是当作左值处理。

为解决这个问题，完美转发（std::forward）也是这个背景下提出的，实际上，我们常用的STL容器，也大量用到这样的完美转发，来追求极致的性能，如list，map，vector等等。
我们只需对函数做小小的改动：

template <typename FooType>
void SomeCommonFunctionWithForward(FooType &&foo) {
    ImplFunction(std::forward<FooType>(foo));
}

{
    // call with lvalue
    Foo foo{"foo"};
    SomeCommonFunctionWithForward(foo);
}

Foo(foo) construct.
Call with lvalue.
Foo(foo) desctruct.

{
    // call with rvalue
    Foo foo{"foo"};
    SomeCommonFunctionWithForward(std::move(foo));
}

Foo(foo) construct.
Call with rvalue.
Foo(foo) desctruct.

具体std::forward的实现也很简单，涉及编译器的引用折叠规则，限于篇幅，本文不再展开。

小结一下

可以说，C++的拷贝和移动颠覆了传统C-styple编程范式，充分避免了对裸指针的使用，编译器对代码有了更多的编译期检查，提升代码的安全性。

我们从左值、右值的分类入手，阐述了在“所有权”这个框架概念下拷贝和移动的公共契约，并“控诉”了C++编译器对这个公共契约没有形成标准条款（开玩笑）。也说明了，为什么通常拷贝构造、拷贝赋值和移动构造、移动赋值这些函数通常要求开发者对偶的实现。

随后，结合实际编程场景，在涉及公共组件开发时，更为复杂的入参设计，并推出“完美转发”的概念，及此类公共接口的处理方式。

很庆幸，我们在上述整个课程中，从未涉及到任何裸指针的访问和操作，这也是现代C++所尽力避免的。移动和拷贝，赋予了编译器更强的检查能力，通常大家在编译器开启所有告警，并且编译结果为（0 error 0 warning）时，大部分不安全的裸指针操作将会被编译期检查所过滤，纵享老司机般丝滑。