C/C++ Tutorial, Goal-Oriented Style (4)

物件導向總結

這篇會提到稍微進階，但是依然是很常用的二個東西，一個是 Smart Pointer，一個是 Thread，最後還會稍微提一下 Template。如果沒記錯，Smart Point 和 Thread 都是 C++11 才引進的東西。不過在之前，我們先為前一章的物件導向 (Object-Oriented) 做個總結。學院派的說法 (笑)，物件導向的四大特色是：

Encapsulation (封裝)
- 把同一概念的資料和函式或功能包在一起
Abstraction (抽象)
- 隱藏不重要的東西
  - 用 access specifier (public / protected / private) 隱藏不用的資料和函式
  - 用 header file 隱藏物件的實作
Polymorphism (多形)
- 同樣的函式名，依照不同的參數有不同的實作
- 利用 operator overloading 複予 + – = * [] 不同的功能
Inheritance (繼承)
- 上一篇的 Moveable / Monster / Player 的關係就是個好例子

雖然是理論派的描述，但是都是寫物件導向程式時經常看到的關鍵字，理解一下沒有壞處。

Smart Point

自行處理 heap 的取得和釋放是非常容易造成 memory leak 的，為了效率和泛用問題，C++ 並沒有向 Java 導入 garbage collector 這種 memory management 的系統，但就沒有更好的方法了嗎?

請參考下面的 solution

class AutoIntArray
{
public:
    AutoIntArray( int* data ) 
    {
        data_ = data;
    }
    
    ~AutoIntArray() 
    {
        delete data_;
    }
private:    
    int* data_;
};

int main()
{
    int* data = new int[100];
    AutoIntArray intarray( data );

    // no more 'delete' at the end!
    return 0;
}

AutoIntArray 這個物件會在 main 的最後自動被消滅，被消滅時會呼叫 destructor，我們利用這種機制，在 destructor 裏面釋放 memory，這樣就不用自己去釋放了。這種概念就是 smart pointer 的基礎。

Smart Pointer 分三種類型

std::unique_ptr
std::shared_ptr
std::weak_ptr

不允許被複製的 unique_ptr

在上一篇提到的 Singleton Pattern 裏面有寫出 “如何讓一個物件不會被複製” 的方法，unique_ptr 就是利用同樣的機制寫出來的 smart pointer。因為它不能被複製，所以使用時就不用擔心 destructor 釋放記憶體時，還有其他的複製 pointer 要使用這塊記憶體的問題。

舉例來說，下面是 unique_ptr 的使用法，我們建立了一個 unique_ptr，直接給了 array，使用完後該記憶體會在 main function 的最後隨著 unique_ptr 一起被消滅與釋放。

#include <memory> // smart pointer
#include <iostream> // cout

int main()
{
    std::unique_ptr<int[]> board( new int[100] );
    board[10] = 5;
    std::cout << board[10] << std::endl;
    return 0;
}

但是如果你想要將 unique_ptr 傳到某個 function 內，在編譯時就會出現錯誤 (from singleton mechanism)。

#include <memory> // smart pointer

void func(std::unique_ptr<int[]> board)
{
}

int main()
{
    std::unique_ptr<int[]> board;
    board = std::make_unique<int[]>(100); // new syntax since C++17 to avoid the 'new' operator
    func( board ); // error! copy happens during pass-by-reference 
    return 0;
}

所以 unique_ptr 的使用情境通常是隨著某個物件產生與消滅的資料，或是只在同一個 function 內建立與刪除的記憶體。

那 unique_ptr 可以傳入某個 function 嗎? 其實是有方法的，需要把所有權轉移給另一個 unique_ptr。請 google ‘std::unique_ptr + std::move’。不過除非你真的知道你要做甚麼，不然這種增加複雜度與減少可讀性的東西請不要任意使用。需要在 function 中傳遞的資料可以用下面的 shared_ptr smart pointer。

可以被複製的 shared_ptr

在講 share_ptr 前，要先講一個 shared_ptr 的大問題，share_ptr 沒辦法正確刪除 C style array! (at least before C++17) 它不像 unique_ptr 會分辨 single object 和 array，share_ptr 只會 delete 不會 delete []。share_ptr 可以用 custom deleter 處理 array，但這種語言設計上出現缺陷時，我建議直接放棄在 share_ptr 中使用 C array。

該怎麼讓 share_ptr 處裡 C array 呢? 在下面 ‘所以，我們還要用 new 和 delete 嗎?‘ 的章節會詳細說明。

shared_ptr 和 unique_ptr 的差別是 shared_ptr 內部有個類似 counter 的機制，每次被複製 (constructor) 一次，counter 就加 1，被刪除 (destructor) 一次，counter 減 1，counter 變 0 時就釋放資源。

像下面的例子

#include <iostream>
#include <memory>

class Board
{
public:
    Board() {}    
    ~Board() {}
};

void func(std::shared_ptr<Board> board)
{
    // now the count is 2
    std::cout << "count: " << board.use_count() << std::endl;
    
    // make another copy
    std::shared_ptr<Board> copy_board = board;

    // both counts are 3    
    std::cout << "count: " << board.use_count() << std::endl;
    std::cout << "count: " << copy_board.use_count() << std::endl;
    
    // count = count - 2 after leaving func
}

int main()
{
    // NOTE: template type is 'Board' instead of 'Board*'
    std::shared_ptr<Board> board = std::make_shared<Board>();
    
    // after create, count should be one
    std::cout << "count: " << board.use_count() << std::endl;
    
    // pass-by-reference is a copy
    func( board );
    
    // after leave func, the count back to 1
    std::cout << "count: " << board.use_count() << std::endl;
    
    return 0;
}

避免 circular references!

二個 share_ptr 互相指向對方會造成二者的 count 都永遠不會歸 0，請注意。

不會增加 count 的 weak_ptr

weak_ptr 不會單獨存在，它雖然是 share_ptr 的 copy，但是不會增加原始的 share_ptr 的 count。如果使用了 weak_ptr，上面說的 circular references 的問題就不會存在。

weak_ptr 有二個重要的 method，一個是 expired()，用來判斷原始的 share_ptr 是否還存在，另一個是 lock()，用來取得原始的 share_ptr。取得原始的 share_ptr 後所有的用法都和 share_ptr 一樣，範例如下。

#include <iostream>
#include <memory>

class Board
{
public:
    Board() {}
    ~Board() {}
};

int main()
{
    std::shared_ptr<Board> board = std::make_shared<Board>();
    std::weak_ptr<Board> weak_board( board );
    
    if ( weak_board.expired() )
    {
        std::cout << "share_ptr was not longer exist!" << std::endl;
    }
    else
    {
        // share_ptr is available, we can use it
        std::shared_ptr<Board> board_original = weak_board.lock();
        
        // do anything you want for the board
    }
    
    return 0;
}

Smart Point 的效率問題

unique_ptr 和手動 delete 的速度幾乎是一樣的
share_ptr 需要計算 count，所以會比手動 delete 多幾次的運算

所以，我們還要用 new 和 delete 嗎?

有一派是認為寫 modern C++ 時，完全不該出現 new / delete，因為 new /delete 不安全。完全不使用 new / delete 的最大問題如何和既有的的 C Library 溝通。下面是個在 C array 和 shared_ptr 一起使用的例子。

原始的 C style code feat. C++ vector 🙂

#include <memory>
#include <vector>

// function with C array parameter
void func( int* data )
{
    // do something
}

int main()
{    
    // create a int array which size is 64
    int* data = new int[64];

    std::vector<int*> vec;
    
    // push one data into vector
    vec.push_back( data );
    
    // get the C array inside the vector
    func( vec.at(0) );

    delete [] data;
    
    return 0;
}

Modern C++ Style without new / delete pair

使用 std::vector 替代 C array
需要和 C function 溝通時，利用 std::vector::get() 取的裡面的 raw data
使用 auto 增加可讀性
用 make_shared / make_unique 來避免出現 operator new

#include <memory>
#include <vector>

// function with C array parameter
void func( int* data )
{
    // do something
}

int main()
{
    // use 'auto' instead of the long code as below
    // std::shared_ptr<std::vector<int>> data =
    //     std::make_shared<std::vector<int>>(64);
    
    // create a std::array which type is char with 64 length
    auto data =
        std::make_shared<std::vector<int>>(64);
    
    // create a vector that each element can store 
    // one c++ char array, each length is 64
    std::vector<std::shared_ptr<std::vector<int>>> vec;
    
    // push one data into vector
    vec.push_back( data );
    
    // get the C array inside the shared_ptr's std::array in the vector[0]
    func( vec.at(0).get()->data() );
    
    return 0;
}

從效率上來說，std::vector 稍稍慢了 C style array 一些，因為多了 boundary checking。然後 shared_ptr 也比原本的 pointer 慢一點，因為 reference count 的問題。但影響都很小，基本上可以忽略。

回到標題的問題，該不該繼續使用 new / delete?

從 code 上比較，C 明顯的簡潔好讀，C++ 要看懂相對的難度較高，像 make_unique 還是 C++17 才出來的新東西 (為了實現不出現 new / delete 所定義的語法) ，有時 code 看起來實在讓人有點煩躁。

順帶一提，用 gcc 編譯 C++17 以上的規範，需要增加參數 -std=c++17。

g++ main.cpp -std=c++17

但從安全性來說，確實沒有了 new / delete 後，memory leak 幾乎就不會出現了 (前提是不要搞出 circular reference，遇到太複雜的指標關係時，記得用 weak_ptr)。

所以我沒有特別的答案，見仁見智吧 (?)

Thread (執行緒)

Thread 簡單的說就是多工，我們為什麼要同時做二件以上的事? 直覺的答案是事情分給好幾個人來做會比較快。但是 CPU 不是人，就算是多核的 CPU，但四核也不代表四倍 CPU，更不代表你開了多工，OS 就有辦法正確的分配給這些核心來做。

所以我們甚麼時候要多工? 等待 I/O 是一個好時機。I/O 有包括讀取檔案，網路傳輸，等待使用者輸入，更新顯示器畫面等等，這些時候用到的都非 CPU 的時間，如果在等待 I/O 的 CPU 閒置時間，就是適合多工讓 CPU 不要發呆的時機了。

原本在 C++ 中，thread 是屬於 platform-dependency 的東西，但 C++ 11 引進了 Thread 的標準。Thread 使用比前半章的 smart pointer 簡單很多，直接給個例子。

#include <iostream>
#include <thread>

void core_task()
{
    // do some task here
    for ( int i = 0; i < 10000000; i ++ )
    {
    }
}

int main()
{
    // running a new thread
    std::thread task( core_task );
    
    // wait for task complete
    task.join();
    
    return 0;
}

如果用 gcc 編譯，需要增加參數 -lpthread。

g++ main.cpp -lpthread

程式本身就是一個 function 叫 core_task，然後開一個多工 thread 專門跑它。因為 main thread 會比較先結束，我們用 join() 卡住 main thread 直到 core_task 完成為止。

Thread-Safe

參考下面的例子

#include <thread>

int g_array[10];

void core_task()
{
    // do something inside g_array
}

int main()
{
    // running a new thread
    std::thread thread_a( core_task );
    std::thread thread_b( core_task );
    
    // wait for both threads complete
    thread_a.join();
    thread_b.join();
    
    return 0;
}

我們開了二個 thread，二者同時都對 g_array 處理一些事情。但是可以想像，因為我們不能控制 core_task 的速度，有很多機會二個 thread 會同時操作 g_array，這會導致每次執行都會有不同的結果，這種情況我們稱 g_array 或是 core_task 不是 thread-safe，再換句話說，非 thread-safe 的東西只能在同一個 thread 裡使用。複數的 thread 造成執行結果每次都不同有個特殊名稱，叫做 race condition。

所以要怎麼把上面的例子變成 thread-safe? 我們需要確保一次只能有一個 thread 操作 g_array，如果前一個 thread 還在使用 g_array，下一個 thread 就要等到前面完成才能開始。

#include <thread>
#include <mutex>

int g_array[10];
std::mutex g_mutex;

void core_task()
{
    // the section between lock and unlock can only 
    // allow one thread execute one time
    g_mutex.lock();
    
    // thread-safe zone
    
    g_mutex.unlock();
}

int main()
{
    // running a new thread
    std::thread thread_a( core_task );
    std::thread thread_b( core_task );
    
    // wait for both threads complete
    thread_a.join();
    thread_b.join();
    
    return 0;
}

上面的例子，用 mutex 隔起來的區域就是一次只能讓一個 thread 進入並執行的地方。若有超過一個以上的 thread 同時要進入，會卡在 lock() 的地方直到其他 thread unlock() 為止。

STL 是 Thread-Safe 嗎?

這是個很常有的問題，像 std::vector 這類的物件我們可以讓不同的 thread 同時使用嗎? 它內部有 mutex 控制不發生 race condition 嗎? 答案有點 tricky。簡單的說，如果只是讀資料，那是 thread-safe，如果是寫資料，寫入不同的 object 是 thread-safe，寫入同一個不是。詳細的說明可以 google。以我來說，因為建一個資料結構本來就是要讀和寫，所以在確定會有 multithread IO 的區塊都還是用 mutex 圍起來。mutex 會造成效能問題，因此 mutex 區域越小越好，計算越少越好，這些也是設計程式時要考慮的地方，不該全部都放入 mutex。想像一下，如果全部的邏輯都放入 mutex，那 multithread 就還有甚麼意義呢?

Smart Point 是 Thread-Safe 嗎?

不是。^_^

Extreme Brief Introduction for Template

Template 是 C++ 裡很重要的特色，但是要自己寫一個 template 的機會其實很少，所以這邊只介紹最基本的概念。

下面有三個物件的宣告，一個放了整數 (int)，一個放了 1 byte 的資料，一個放了 std::string 的字串。三個除了儲存的資料型態不同，其他都一樣。

#include <string>

class DataInt
{
public:
    int data_;
};

class DataUint8
{
public:
    uint8_t data_;
};

class DataString
{
public:
    std::string data_;
};
int main()
{
    DataInt data1;
    DataChar data2;
    DataString data3;
    return 0;
}

如果用了 template 就不用寫三種了，如下:

#include <string>

template <typename T>
class DataCommon
{
public:
    T data_;
};

int main()
{
    DataCommon<int> data1;
    DataCommon<uint8_t> data2;
    DataCommon<std::string> data3;
    return 0;
}

在其他物件導向語言中，一般是建立一個最上層的母物件，然後所有的資料都是該物件的繼承，就像上一篇的 player 和 monster 都是 moveable 的繼承物件，所以可以放在同一個資料結構中一樣。

但這種作法在 runtime 時需要耗費時間做類別的比較和轉換，C/C++ 對效率很敏感，所以提供了 template 的技巧。 template 很像是文字編輯軟體一樣，它是在 compile time 根據 template 中的類型自動產生程式，然後編譯。

像是如果你寫了 DataCommon<std::vector<int>> ，compiler 才會為 std::vector<int> 產生一個新的 DataCommon 的物件，沒寫到的類型是不會產生任何程式代碼的。因為執行過程中沒有任何類型繼承關係的比較，所以效率提高很多。

Standard Template Library (STL) 就是完全用 template 寫成的 C++ 的資料結構庫，之前提過的 std::vector 就是。所以 vector 可以用 template 儲存所有的類型，又能保持執行時的高效率。

不過如我之前說的，一般來說要自己寫的機會實在太少了，所以了解 STL 的大於小於括號是甚麼意思就好了。:)

Yih Horng

Programmer and Gamer

C/C++ Tutorial, Goal-Oriented Style (4) Smart Pointer and Thead