Basic C++ Practices

Overview

This document some basic and generally applicable practices toward writing better C++ code
The practices apply to C++23 and beyond
This is not meant to be comprehensive but rather to highlight key ideas from elsewhere.

Variable Declaration and Initialization

What?

The rule of thumb here is Almost Always Auto
- In most cases, when declaring local variables, it makes sense to use auto
Below are some examples:

auto x = y; // inferred as int
auto y = 2.0; // inferred as double
auto z = 2u;  // inferred as unsigned int
auto q = uint16_t{45}; // explicit type

Why?

auto declarations require an initialization
- In other words, auto x; results in a compilation error
auto selects the best general-purpose integer type for the machine
auto does not prevent you from explicitly specifying types

Why Not?

Some objects (e.g., in linear algebra libraries) rely on proxy objects to, for example, implement lazy evaluation
You usually do not want a varaible that refers to these proxy objects

Use std::vector<>

What?

A safe, dynamically re-sizable, array
- Like a python List
A most versatile data-structure of first resort

auto v = std::vector<int>{2, 3, 4};
auto x = v.at(2); // 4
v.at(0) = 1; // v == {1, 3, 4}

How?

Always use v.at(index) to into the vector
- Ensures run-time check of the index, will throw an exception if out-of-bounds
(Almost) Never use v[index]
- No run-time check of the bounds
- Boosts performance, but if your code has bugs can cause major problems
- Only use if you have determined, through measurements, that you cannot afford the run-time check.
Avoid indexing and use a range-based for loop, with Constrained Algorithms or std::algorithms
- Usually it's a good idea, but when doing mathematical code indexes can be useful.

Why?

std::vector is a relatively straightforward data-structure with a known performance profile
std::vector is usually good enough
Modern computers favor linear algorithms with minimal branching, which fits well with std::vector

Caveats

Don't use std::vector<bool> because it does not conform to the proper container interface
Use std::vector<uint8_t>, std::deque<bool> or std::bitset.
- All have various trade-offs, std:"vector<uint8_t> at least behaves like a normal vector
See Meyers, Scott Effective STL, Chapter 2, Item 18
It's a fine data structure, just named incorrectly

Alternatives?

std::array

For fixed-size arrays, you can use std::array

Need to know the size at compile time
Can enhance performance of std::vector, but that performance boost is likely unnecessary unless measured to be
Still provide .at() for bounds-checked access

built-in array

There is usually no need to use built-in arrays, especially in this class

Use Range-based for loops

In C++ you can iterate over ranges of items using a range-based for loop
- Basically anything with a .begin() and .end() iterator can be iterated over with such a loop
- std::vector is one such container that satisfies this requirement
There are three primary forms to use. Which to use essentially follows the same rules as parameter passing.
1. for(auto v : myvector)
  - Here, each element in myvector is copied into v
  - This mode is appropriate when the type held by myvector is small (e.g., int, double, etc.)
2. for(const auto & v : myvector)
  - Here each \(v\) is presented as a reference to a const element of the myvector
  - The for loop may inspect but may not call non-const methods on the object
  - This mode is appropriate if the type held by myvector is large/custom (e.g., a class you wrote)
3. for(auto & v : myvector)
  - Here, each element is presented as a reference to a mutable element to myvector
  - This mode is appropriate only if you wish to modify the elements in myvector in the loop
Don't force yourself into using range-based for: sometimes indexes are useful, so if you need them, use them
The appropriate type for iterating over the elements of the vector is std::size_t. for(std::size_t i = 0; i < myvector.size(); ++i)

Memory

Overview

C++ provides programmers with direct access to the memory where variables and objects are stored
This provides absolute control over the program, but also creates the chance to introduce bugs

The Stack

The stack is a part of memory where local variables are stored
- Local variables also include the parameters to the function
When a function is called, it's local variables are placed on the stack
When a function returns, it's local variables are removed from the stack
Thus, the memory used for storing local variables
- Is automatically allocated upon entering a function
- Is automatically de-allocated upon leaving a function
Local variables do not persist beyond a function call

The Heap

The heap is an area of memory managed by the programmer
Objects can be allocated on the heap with the new operator
- Calling new allocates memory and calls the Object's constructor
Objects can be de-allocated from the heap with the delete operator
- Calling delete calls the object's destructor and de-allocates it's memory
new [] and delete [] can allocate and de-allocate arrays of objects
If your program forgets to de-allocate memory (e.g., does not call delete) it can leak memory
If your program accidentally de-allocates memory twice it can crash and create a security bug

Pointers

Every variable has a memory address.
Pointers store the memory address of a variable
In Modern C++ pointers should almost never be used.
- However, a basic understanding of pointers is useful

Consider the following code:

auto var = 20;
//int * pvar = &var;
auto pvar = &var;
//int ** ppvar = &pvar
auto ppvar = &pvar;

Memory might be laid out as below:

Address	Value	Variable
0x1000	20	var
0x1004	0x00001000	pvar
0x1008	0x00001004	ppvar

The variable var holds 20.
The address of var is &var and it is 0x1000 (in reality the address would be 64 bits on an x86_64 machine).
The pointer pvar == &var.
- To access var through pvar it can be de-referenced.
- For example *pvar == 20 and *pvar = 40; var == 40
Pointers are in-and-of-themselves variables and are stored at memory locations
- &pvar == = 0x1004 in this case
nullptr indicates that a pointer is invalid and does not point to anything
- nullptr pointers should never be dereferenced
Pointers allow passing the address of data, rather than copying the data itself
- Only the address needs to be copied, not the whole data structure
Pointers can be re-assigned to point to new objects

References

References are like pointers that are:
- Never nullptr
- Cannot be re-asssigned
- Are always dereferenced
Taking the address of a reference returns the address of the underlying variable
References are primarily used for two purposes
1. Passing objects to functions
2. Referring to the current element in range-based for loops

Here is an example of how they work (illustrative only)

auto x = 2;
auto & y = x;
y = 3;
// now x == 3

Smart Pointers

shared Ptr

shared_ptr<T> creates a reference-counted smart pointer
It refers to an object obj of type T
Models the pattern of shared ownership
When all shared_ptr to obj are destroyed, obj is destroyed
- As long as you hold a shared_ptr to obj you know that it will not be destroyed
Does not handle circular references
- Object obj1 holds a shared_ptr obj2
- Object obj2 holds a shared_ptr to obj1
  - Now obj1 can't be destroyed unless obj2 is destroyed
  - But obj2 can't be destroyed unless obj1 is destroyed
Create with std::make_shared

unique_ptr

unique_ptr<T> creates a smart pointer with single ownership
Only one unique_ptr at a time can reference obj
When the unique_ptr goes out of scope, obj is destroyed
Cannot be copied
Ownership can be transferred to a new unique_ptr by moving from it
Create with std::make_unique

Recommendations

In general, pointers should be avoided. They can be used in a few situation
- Interfacing with C code that uses pointers
- Non-owning use in contexts when references don't make sense.
  - Someone else is responsible for managing memory lifetime
  - The value pointed at may be nullptr
  - The context is not a parameter passing context
References should be used only for parameter passing and range-based for loops
- Don't use references as member variables (they prevent copying your object)
In most cases you don't need to allocate your own memory
1. Containers like std::vector instead
2. std::shared_ptr<>, created with std::make_shared
3. std::unique_ptr<>, created with std::make_unique
4. There is no need to use new or delete in this class
If you must allocate memory (e.g., to make your own data structures)
- Use Objects: Memory is allocated in constructors and deallocated in destructors
- See: Scott Meyers Effective C++, Chapter 3, Item 13

Parameter Passing

Pass By Value

Pass by value is the preferred method for passing small objects to functions
Consider void func(MyObject obj)
- In this example, obj is passed by value
- This means that the copy constructor of obj is called
  - If obj has no copy constructor, pass by value cannot be used
- Copying large objects can be time consuming, but is trivial for small objects

Pass By Reference to Const

Pass by reference to const is a method for passing objects without requiring copying
Consider void func(const MyClass & obj) and void func(MyClass const & obj)
- Both examples are exactly the same semantically
- The first is more common, the second is more consistent with other uses of const
- No copy is made when passing by reference to const
- func can cannot modify obj (see for precisely how/why)
- func borrows obj
  - The usage of obj must end when func returns
  - This means that func should not store a pointer/reference to obj anywhere
    - Some other part of the code owns obj and could destroy obj, which would lead to a dangling reference (very bad)

Pass by shared_ptr

Pass the shared_ptr object by value: void myfunc(shared_ptr<MyClass> obj)
Shared ownership means that myfunc can retain a shared_ptr to obj
obj is destroyed when there are no more shared_ptr pointing to obj
- In other words, all the shared owners can use the object
- When the shared owners are done, the object is destroyed
Edge Case: Does not work if their are circular references:
Just because you have a shared_ptr to an object, does not mean you should pass by shared_ptr
- In most cases it is appropriate to de-reference the pointer and pass by const reference
- Only if the method that is called must retain the object after it is finished do you need to share ownership

Pass by unique_ptr

Pass the unique_ptr by value: void func(unique_ptr<MyClass> obj)
The caller must transfer ownership to func using std::move:
func now becomes responsible for managing the lifetime of obj

Here is an example

auto obj = std::make_unique<MyClass>(MyClass constructor arguments);
func(std::move(obj));
// x cannot be used

Other Methods

These notes don't explain these methods, but they are here for completeness

Mutable reference: void myfunc(MyClass & out)
- Like Python pass-by-reference
- An implementation of "Output Parameters"
- out can be modified in myfunc and those changes are seen outside the function
- Not usually needed because in C++ it is okay to Return By Value
lvalue reference: void myfunc(MyClass && lvalue)
forwarding reference: template<class T> myfunc(T && myval)
Pointer: void myfunc(MyClass * p)
- Useful for interfacing with C
- The pointer can also be const

Returning Objects

In modern C++, there are powerful guarantees of Copy Elission
In many cases, returning an object by value will not incur additional costs
As for all performance-related questions: if you are concerned that returning by value is too costly, measure.

To return multiple objects by value use std::pair or std::tuple

std::tuple<int, char, double> stuff()
{
    // The return type is known so we can
    // call the constructor directly with braces
    return {1, 'b', 2.0};
}

// structured bindings let us get separate variables
// for each item in the tuple easily
auto [x, y, z] = stuff();
// x == 1
// y == 'b'
// z == 2.0

Logical `const`

In C++ const means logically constant, meaning that outside observers cannot see that the object has changed
logical const is conceptually different than bitwise-const: the internal state of an object is allowed to change
The status of const is enforced by the compiler
- const auto x = 2 means that the value of x will not change
- void myfunc(const MyClass & obj) means that
  - Only const member functions of obj can be called
  - public member variables of obj are treated as if declared const
- void MyClass::member() const
  - Within this method, the member variables are treated as if declared const
Effectively:
- Only const methods can be called on const objects
- const methods can't modify the object
- Therefore the compiler enforces const
In practice:
- It is possible to circumvent const
- The mutable keyword allows a member variable to be changed by a const function
  - This is okay only if the change is not observable outside the class and is thread safe
  - There are sometimes reasons to do this (e.g., caching computations), but it's much easier if you avoid
- The const_cast<> can be used to cast constness away
  - Basically never do this, it is not always possible anyway (because some data can be stored in read-only memory for example)

Overview

Variable Declaration and Initialization

What?

Why?

Why Not?

Use std::vector<>

What?

How?

Why?

Caveats

Alternatives?

std::array

built-in array

Use Range-based for loops

Memory

Overview

The Stack

The Heap

Pointers

References

Smart Pointers

shared Ptr

unique_ptr

Recommendations

Parameter Passing

Pass By Value

Pass By Reference to Const

Pass by shared_ptr

Pass by unique_ptr

Other Methods

Returning Objects

Logical const

Logical `const`