UP | HOME

Basic C++ Practices

Overview

  • This document some basic and generally applicable practices toward writing better C++ code
  • The practices apply to C++23 and beyond
  • This is not meant to be comprehensive but rather to highlight key ideas from elsewhere.

Variable Declaration and Initialization

What?

  1. The rule of thumb here is Almost Always Auto
    • In most cases, when declaring local variables, it makes sense to use auto
  2. Below are some examples:
auto x = y; // inferred as int
auto y = 2.0; // inferred as double
auto z = 2u;  // inferred as unsigned int
auto q = uint16_t{45}; // explicit type

Why?

  • auto declarations require an initialization
    • In other words, auto x; results in a compilation error
  • auto selects the best general-purpose integer type for the machine
  • auto does not prevent you from explicitly specifying types

Why Not?

  • Some objects (e.g., in linear algebra libraries) rely on proxy objects to, for example, implement lazy evaluation
  • You usually do not want a varaible that refers to these proxy objects

Use std::vector<>

What?

  1. A safe, dynamically re-sizable, array
    • Like a python List
  2. A most versatile data-structure of first resort
auto v = std::vector<int>{2, 3, 4};
auto x = v.at(2); // 4
v.at(0) = 1; // v == {1, 3, 4}

How?

  1. Always use v.at(index) to into the vector
    • Ensures run-time check of the index, will throw an exception if out-of-bounds
  2. (Almost) Never use v[index]
    • No run-time check of the bounds
    • Boosts performance, but if your code has bugs can cause major problems
    • Only use if you have determined, through measurements, that you cannot afford the run-time check.
  3. Avoid indexing and use a range-based for loop, with Constrained Algorithms or std::algorithms
    • Usually it's a good idea, but when doing mathematical code indexes can be useful.

Why?

  1. std::vector is a relatively straightforward data-structure with a known performance profile
  2. std::vector is usually good enough
  3. Modern computers favor linear algorithms with minimal branching, which fits well with std::vector

Caveats

  1. Don't use std::vector<bool> because it does not conform to the proper container interface
  2. Use std::vector<uint8_t>, std::deque<bool> or std::bitset.
    • All have various trade-offs, std:"vector<uint8_t> at least behaves like a normal vector
  3. See Meyers, Scott Effective STL, Chapter 2, Item 18
  4. It's a fine data structure, just named incorrectly

Alternatives?

std::array

For fixed-size arrays, you can use std::array

  • Need to know the size at compile time
  • Can enhance performance of std::vector, but that performance boost is likely unnecessary unless measured to be
  • Still provide .at() for bounds-checked access

built-in array

There is usually no need to use built-in arrays, especially in this class

Use Range-based for loops

  1. In C++ you can iterate over ranges of items using a range-based for loop
    • Basically anything with a .begin() and .end() iterator can be iterated over with such a loop
    • std::vector is one such container that satisfies this requirement
  2. There are three primary forms to use. Which to use essentially follows the same rules as parameter passing.
    1. for(auto v : myvector)
      • Here, each element in myvector is copied into v
      • This mode is appropriate when the type held by myvector is small (e.g., int, double, etc.)
    2. for(const auto & v : myvector)
      • Here each \(v\) is presented as a reference to a const element of the myvector
      • The for loop may inspect but may not call non-const methods on the object
      • This mode is appropriate if the type held by myvector is large/custom (e.g., a class you wrote)
    3. for(auto & v : myvector)
      • Here, each element is presented as a reference to a mutable element to myvector
      • This mode is appropriate only if you wish to modify the elements in myvector in the loop
  3. Don't force yourself into using range-based for: sometimes indexes are useful, so if you need them, use them
  4. The appropriate type for iterating over the elements of the vector is std::size_t. for(std::size_t i = 0; i < myvector.size(); ++i)

Memory

Overview

  • C++ provides programmers with direct access to the memory where variables and objects are stored
  • This provides absolute control over the program, but also creates the chance to introduce bugs

The Stack

  1. The stack is a part of memory where local variables are stored
    • Local variables also include the parameters to the function
  2. When a function is called, it's local variables are placed on the stack
  3. When a function returns, it's local variables are removed from the stack
  4. Thus, the memory used for storing local variables
    • Is automatically allocated upon entering a function
    • Is automatically de-allocated upon leaving a function
  5. Local variables do not persist beyond a function call

The Heap

  1. The heap is an area of memory managed by the programmer
  2. Objects can be allocated on the heap with the new operator
    • Calling new allocates memory and calls the Object's constructor
  3. Objects can be de-allocated from the heap with the delete operator
    • Calling delete calls the object's destructor and de-allocates it's memory
  4. new [] and delete [] can allocate and de-allocate arrays of objects
  5. If your program forgets to de-allocate memory (e.g., does not call delete) it can leak memory
  6. If your program accidentally de-allocates memory twice it can crash and create a security bug

Pointers

  • Every variable has a memory address.
  • Pointers store the memory address of a variable
  • In Modern C++ pointers should almost never be used.
    • However, a basic understanding of pointers is useful

Consider the following code:

auto var = 20;
//int * pvar = &var;
auto pvar = &var;
//int ** ppvar = &pvar
auto ppvar = &pvar;

Memory might be laid out as below:

Address Value Variable
0x1000 20 var
0x1004 0x00001000 pvar
0x1008 0x00001004 ppvar
  • The variable var holds 20.
  • The address of var is &var and it is 0x1000 (in reality the address would be 64 bits on an x86_64 machine).
  • The pointer pvar == &var.
    • To access var through pvar it can be de-referenced.
    • For example *pvar == 20 and *pvar = 40; var == 40
  • Pointers are in-and-of-themselves variables and are stored at memory locations
    • &pvar == = 0x1004 in this case
  • nullptr indicates that a pointer is invalid and does not point to anything
    • nullptr pointers should never be dereferenced
  • Pointers allow passing the address of data, rather than copying the data itself
    • Only the address needs to be copied, not the whole data structure
  • Pointers can be re-assigned to point to new objects

References

  1. References are like pointers that are:
    • Never nullptr
    • Cannot be re-asssigned
    • Are always dereferenced
  2. Taking the address of a reference returns the address of the underlying variable
  3. References are primarily used for two purposes
    1. Passing objects to functions
    2. Referring to the current element in range-based for loops
  4. Here is an example of how they work (illustrative only)

    auto x = 2;
    auto & y = x;
    y = 3;
    // now x == 3
    

Smart Pointers

shared Ptr

  1. shared_ptr<T> creates a reference-counted smart pointer
  2. It refers to an object obj of type T
  3. Models the pattern of shared ownership
  4. When all shared_ptr to obj are destroyed, obj is destroyed
    • As long as you hold a shared_ptr to obj you know that it will not be destroyed
  5. Does not handle circular references
    • Object obj1 holds a shared_ptr obj2
    • Object obj2 holds a shared_ptr to obj1
      • Now obj1 can't be destroyed unless obj2 is destroyed
      • But obj2 can't be destroyed unless obj1 is destroyed
  6. Create with std::make_shared

unique_ptr

  1. unique_ptr<T> creates a smart pointer with single ownership
  2. Only one unique_ptr at a time can reference obj
  3. When the unique_ptr goes out of scope, obj is destroyed
  4. Cannot be copied
  5. Ownership can be transferred to a new unique_ptr by moving from it
  6. Create with std::make_unique

Recommendations

  1. In general, pointers should be avoided. They can be used in a few situation
    • Interfacing with C code that uses pointers
    • Non-owning use in contexts when references don't make sense.
      • Someone else is responsible for managing memory lifetime
      • The value pointed at may be nullptr
      • The context is not a parameter passing context
  2. References should be used only for parameter passing and range-based for loops
    • Don't use references as member variables (they prevent copying your object)
  3. In most cases you don't need to allocate your own memory
    1. Containers like std::vector instead
    2. std::shared_ptr<>, created with std::make_shared
    3. std::unique_ptr<>, created with std::make_unique
    4. There is no need to use new or delete in this class
  4. If you must allocate memory (e.g., to make your own data structures)

Parameter Passing

Pass By Value

  1. Pass by value is the preferred method for passing small objects to functions
  2. Consider void func(MyObject obj)
    • In this example, obj is passed by value
    • This means that the copy constructor of obj is called
      • If obj has no copy constructor, pass by value cannot be used
    • Copying large objects can be time consuming, but is trivial for small objects

Pass By Reference to Const

  1. Pass by reference to const is a method for passing objects without requiring copying
  2. Consider void func(const MyClass & obj) and void func(MyClass const & obj)
    • Both examples are exactly the same semantically
    • The first is more common, the second is more consistent with other uses of const
    • No copy is made when passing by reference to const
    • func can cannot modify obj (see for precisely how/why)
    • func borrows obj
      • The usage of obj must end when func returns
      • This means that func should not store a pointer/reference to obj anywhere
        • Some other part of the code owns obj and could destroy obj, which would lead to a dangling reference (very bad)

Pass by shared_ptr

  1. Pass the shared_ptr object by value: void myfunc(shared_ptr<MyClass> obj)
  2. Shared ownership means that myfunc can retain a shared_ptr to obj
  3. obj is destroyed when there are no more shared_ptr pointing to obj
    • In other words, all the shared owners can use the object
    • When the shared owners are done, the object is destroyed
  4. Edge Case: Does not work if their are circular references:
  5. Just because you have a shared_ptr to an object, does not mean you should pass by shared_ptr
    • In most cases it is appropriate to de-reference the pointer and pass by const reference
    • Only if the method that is called must retain the object after it is finished do you need to share ownership

Pass by unique_ptr

  1. Pass the unique_ptr by value: void func(unique_ptr<MyClass> obj)
  2. The caller must transfer ownership to func using std::move:
  3. func now becomes responsible for managing the lifetime of obj
  4. Here is an example

    auto obj = std::make_unique<MyClass>(MyClass constructor arguments);
    func(std::move(obj));
    // x cannot be used
    

Other Methods

These notes don't explain these methods, but they are here for completeness

  1. Mutable reference: void myfunc(MyClass & out)
    • Like Python pass-by-reference
    • An implementation of "Output Parameters"
    • out can be modified in myfunc and those changes are seen outside the function
    • Not usually needed because in C++ it is okay to Return By Value
  2. lvalue reference: void myfunc(MyClass && lvalue)
  3. forwarding reference: template<class T> myfunc(T && myval)
  4. Pointer: void myfunc(MyClass * p)
    • Useful for interfacing with C
    • The pointer can also be const

Returning Objects

  • In modern C++, there are powerful guarantees of Copy Elission
  • In many cases, returning an object by value will not incur additional costs
  • As for all performance-related questions: if you are concerned that returning by value is too costly, measure.
  • To return multiple objects by value use std::pair or std::tuple

    std::tuple<int, char, double> stuff()
    {
        // The return type is known so we can
        // call the constructor directly with braces
        return {1, 'b', 2.0};
    }
    
    // structured bindings let us get separate variables
    // for each item in the tuple easily
    auto [x, y, z] = stuff();
    // x == 1
    // y == 'b'
    // z == 2.0
    

Logical const

  1. In C++ const means logically constant, meaning that outside observers cannot see that the object has changed
  2. logical const is conceptually different than bitwise-const: the internal state of an object is allowed to change
  3. The status of const is enforced by the compiler
    • const auto x = 2 means that the value of x will not change
    • void myfunc(const MyClass & obj) means that
      • Only const member functions of obj can be called
      • public member variables of obj are treated as if declared const
    • void MyClass::member() const
      • Within this method, the member variables are treated as if declared const
  4. Effectively:
    • Only const methods can be called on const objects
    • const methods can't modify the object
    • Therefore the compiler enforces const
  5. In practice:
    • It is possible to circumvent const
    • The mutable keyword allows a member variable to be changed by a const function
      • This is okay only if the change is not observable outside the class and is thread safe
      • There are sometimes reasons to do this (e.g., caching computations), but it's much easier if you avoid
    • The const_cast<> can be used to cast constness away
      • Basically never do this, it is not always possible anyway (because some data can be stored in read-only memory for example)

Author: Matthew Elwin.