Commit d6b09aa6 authored by Praetorius, Simon's avatar Praetorius, Simon
Browse files

Section about advanced datatypes

parent 6e21e159
---
class: center, middle
# Advanced Datatypes
---
# Advanced Datatypes
## Type aliases
Often you do not always want to care about the actual datatype used in your program, e.g., if it is
`float` or `double` but instead give the types more reasonable names, e.g., `real`. In C++ you can
do this via `typedef`/`using`:
```c++
typedef <type> <alias_name>;
using <alias_name> = <type>; //< suggested
```
Afterwards, `<alias_name>` can be used like any other datatype:
```c++
using real_t = double;
using string_t = char*;
using matrix_t = real_t**; // pointer to pointer
string_t const str = "String";
matrix_t A = new real_t*[ 10 ];
real_t f = real_t( 3.1415926 );
```
---
# Advanced Datatypes
## Type aliases
> .h3[Remark:] A `real_t` datatype allows you to easily switch between `float` and
> `double` in your program.
To simplify the distinction between variables and datatypes, the following is strongly advised:
> .h3[Coding Principle:] Follow a strict convention in naming new types, e.g., with special
> prefix or suffix. Either use `_t` for your aliases, or specify type names always with CamelCase
> letters.
---
# Advanced Datatypes
## Predefined Types
The C++ library and the operating system usually define some abbreviations for often used types, e.g.,
- unsigned integer with explicit size:
`std::uint8`,
`std::uint16` and
`std::uint32` for 8, 16 and 32 bit
respectively,
- similar `std::int8`,
`std::int16` and
`std::int32` are defined for signed
integers,
- `std::size_t` : unsigned integer type for
holding size information best suited for the local hardware
> .h3[Codeing Principle:] Use `std::size_t` for representing container size information and indices.
---
# Advanced Datatypes
## Structures
Working with vectors and matrices always involved several variables, e.g., the size and the arrays.
That results in many arguments to functions and hence, to possible errors. It would be much better to
store all associated data together. That is done with `struct`:
```c++
struct <name> {
<type_1> <variable_1>;
<type_2> <variable_2>;
<type_2> <variable_3>;
};
```
By defining a `struct`, also a new datatype named `<name>` is defined.
---
# Advanced Datatypes
## Struct Example:
```c++
struct Vector {
std::size_t size;
real_t* data;
};
struct Matrix {
std::size_t nrows, ncols;
real_t* data;
};
// y += alpha * A * x
void mat_vec(real_t const alpha,
Matrix const& A,
Vector const& x,
Vector& y);
```
---
# Advanced Datatypes
## Initializing Structure Members
A variable of the datatype a `struct` represents can be created by using the universal initialization
```c++
Vector x1{10, new real_t[10]};
Vector x2{}; // default initialization of all members
Vector x3; // just definition, no initialization of the members
```
--
It is also possible to make a named initialization
```c++
Vector x4{.size = 10, .data = new real_t[10]};
```
--
Allocating an object of the type dynamically, can be done using the `new` operator:
```c++
Vector* x5 = new Vector{...}; // or without braces
```
---
# Advanced Datatypes
## Initializing and Accessing Structure Members
The individual variables in a struct are accessed via `.` operator, i.e.:
```c++
Vector x;
x.size = 10;
x.data = new real_t[ x.size ];
```
If a pointer to a struct is given, the access can be simplified. Instead
of `*` (deref) and `.`, the operator `->` is provided:
```c++
Vector* x = new Vector;
x->size = 10;
x->data = new real_t[ x->size ];
std::cout << (*x).size << std::endl; // alternative (note the operator precedence)
```
---
# Advanced Datatypes
# Accessing Structure
In case of a reference, e.g., `Vector&`, the standard `.` access has to be used:
```c++
Vector x{...};
Vector& y = x;
std::cout << y.size << std::endl;
std::cout << y.data[5] << std::endl;
```
---
# Advanced Datatypes
## Structures and Functions
Structures can be supplied as normal function arguments, either as call-by-value or call-by-reference:
```c++
double dot (Vector const x, Vector const y);
void set (Vector& y, real_t const alpha);
void axpy (real_t const alpha, Vector const& x, Vector& y);
```
--
When passed by value (or by `const&`) the function argument can be initialized directly using curly braces `{...}`:
```c++
double* a = ...;
double* b = ...;
double result = dot({10, a}, {.size=10, .data=b});
```
---
# Advanced Datatypes
## Structures and Functions
When using call-by-value, a copy of the complete struct is actually
created. For large structs, this can be a significant overhead:
```c++
struct QuadratureRule {
real_t points[ 100 ];
real_t weights[ 100 ];
};
template <class Func>
double quadrature (QuadratureRule const rule, Func f);
```
> .h3[Note:] When a static array is stored in a `struct` and passed by value, all the array elements
> are copied. Also when structures with array members are returned from functions all elements are copied.
---
# Advanced Datatypes
## Structures and Functions
In such cases, call-by-reference with a `const` reference argument is necessary to avoid
this overhead:
```c++
template <class Func>
double quadrature (QuadratureRule const& rule, Func f);
```
Here, only a single pointer is supplied to the function instead of 200
`real_t` values.
---
# Advanced Datatypes
## Arrays of Struct
Like any other datatype, structs can also be allocated in the form of an array:
```c++
struct Coord {
real_t x, y, z;
};
Coord coordinates[ 10 ];
```
for fixed sized array or
```c++
Coord* coordinates = new Coord[ 10 ];
```
using dynamic memory management.
---
# Advanced Datatypes
## Arrays of Structs
The access to struct members then comes after addressing the array entry:
```c++
#include <cmath>
...
for (std::size_t i = 0; i < 10; ++i)
{
coordinates[i].x = std::cos( real_t(i) * 36.0 * M_PI / 180.0 );
coordinates[i].y = std::sin( real_t(i) * 36.0 * M_PI / 180.0 );
coordinates[i].z = real_t(i) / 10.0;
}
```
---
# Advanced Datatypes
Structures can be nested, i.e., a struct can be a member of another struct:
## Example: Sparse Matrix in Coordinate Format
```c++
struct Triple {
std::size_t row;
std::size_t col;
real_t value;
};
struct TripleList {
std::size_t size;
Triple* data;
}
struct SparseMatrix {
std::size_t nrows;
std::size_t ncols;
TripleList nonzeros;
};
```
---
# Advanced Datatypes
## Example: Sparse Matrix in Coordinate Format
Laplace 5-point stencil
\[
A = \begin{pmatrix}
4 & -1 & & & & \\
-1 & 4 & -1 & & & \\
& -1 & \ddots & & & \\
& & -1 & 4 & -1 \\
& & &-1 & 4
\end{pmatrix}
\]
---
# Advanced Datatypes
## Example: Sparse Matrix in Coordinate Format
The nested structures can be initialized directly using nested curly braces:
```c++
SparseMatrix mat{10,10, {28, new Triple[28]}};
std::size_t idx = 0;
mat.nonzeros.data[idx++] = {0,0, 4.0};
for (std::size_t i = 1; i < 10; ++i) {
mat.nonzeros.data[idx++] = {i,i, 4.0};
mat.nonzeros.data[idx++] = {i-1,i, -1.0};
mat.nonzeros.data[idx++] = {i,i-1, -1.0};
}
assert(idx == 28);
```
---
# Advanced Datatypes
## Example: Sparse Matrix in Coordinate Format
### Matrix-Vector product
```c++
void mat_vec(real_t const alpha,
SparseMatrix const& A,
Vector const& x,
Vector& y)
{
for (std::size_t i = 0; i < A.nonzeros.size; ++i) {
auto const& entry = A.nonzeros[i];
y[entry.row] += alpha * entry.value * x[entry.col];
}
}
```
---
# Advanced Datatypes
## Example2: Sparse Matrix in Compressed Format
We can store sparse matrices even more memory efficient, with more locality. Three arrays:
- `indices`: stores column indices for all entries, sorted by row,
- `values`: stores all coefficients in same order as in `colind` and
- `offset`: stores at `offset[i]` the position of the first value corresponding to row `i` in the arrays
`indices` and `values`. The last field, contains the number of nonzeros.
This format is known as the *compressed row storage* format.
```c++
struct CRSMatrix {
std::size_t nrows, ncols;
std::size_t* offset;
std::size_t* indices;
real_t* values;
};
```
---
# Advanced Datatypes
## Example2: Sparse Matrix in Compressed Format
For the matrix
\[
A=\begin{pmatrix}
1 & & 3 & \\
& 2 & & -1 \\
-4 & -1 & 1 & \\
1 & & & 3
\end{pmatrix}
\]
the corresponding source code is:
```c++
std::size_t offset[] = { 0, 2, 4, 7, 9 }; // accumulated nr of entries per row
std::size_t indices[] = { 0, 2, 1, 3, 0, 1, 2, 0, 3 };
real_t values[] = { 1, 3, 2, -1, -4, -1, 1, 1, 3 };
CRSMatrix mat{4, 4, offset, indices, values};
```
---
# Advanced Datatypes
## Example2: Sparse Matrix in Compressed Format
### Matrix-Vector product
```c++
void mat_vec(real_t const alpha,
CRSMatrix const& A,
Vector const& x,
Vector& y)
{
for (std::size_t i = 0; i < A.nrows; ++i)
{
real_t f = 0.0;
std::size_t const lb = A.offset[ i ];
std::size_t const ub = A.offset[ i+1 ];
for (std::size_t j = lb; j < ub; ++j)
f += A.values[ j ] * x[ A.indices[ j ] ];
y[ i ] += alpha * f;
}
}
```
\ No newline at end of file
......@@ -430,13 +430,13 @@ There are high-level alternatives in the c++ standard library
---
# Arrays and Dynamic Memory
## `std::array<T,size>`
The problems with *static* arrays are:
- no size information
- decays to pointer when passed as argument
- no direct copy operation
## Alternative: `std::array<T,size>`
```c++
#include <array>
......@@ -454,13 +454,12 @@ void f2 (std::array<int,3>& arg); // pass-by-reference
---
# Arrays and Dynamic Memory
## `std::vector<T>`
The problems with *dynamic* arrays are:
- no size information
- no copy operation
- must be deleted manually
## Alternative: `std::vector<T>`
```c++
#include <cassert> // for assert(...)
#include <vector>
......@@ -479,13 +478,12 @@ void f2 (std::vector<int>& arg); // pass-by-reference
---
# Arrays and Dynamic Memory
## `std::span<T,Extend>`
- Arrays do not carry any size information
- `std::span` describes an object that can refer to a contiguous sequence of objects with the first
element of the sequence at position zero.
- Either *dynamic extend* (dynamic size information) or *static extend*
## Alternative: `std::span<T,Extend>`
```c++
#include <cassert> // for assert(...)
#include <span>
......@@ -505,11 +503,10 @@ assert(b_span.size() == 3);
---
# Arrays and Dynamic Memory
## `std::shared_ptr<T>` and `std::unique_ptr<T>`
The problems with dynamic allocated memory: ownership is unclear. Who is responsible for releasing the memory?
(1) Shared ownership. Memory is released when the last share is deleted:
## Alternative 1: `std::shared_ptr<T>`
Shared ownership. Memory is released when the last share is deleted:
```c++
#include <cassert>
......@@ -531,11 +528,10 @@ The problems with dynamic allocated memory: ownership is unclear. Who is respons
---
# Arrays and Dynamic Memory
## `std::shared_ptr<T>` and `std::unique_ptr<T>`
The problems with dynamic allocated memory: ownership is unclear. Who is responsible for releasing the memory?
(2) Unique ownership. Pointer cannot be copied.
## Alternative 2: `std::unique_ptr<T>`
Unique ownership. Pointer cannot be copied.
```c++
#include <cassert>
......@@ -569,7 +565,7 @@ double* row4 = new double[4]{4.0, 5.0, 6.0, 7.0};
double** mat1 = new double*[4]{row1, row2, row3, row3};
```
<img src="images/ptr-ptr.png" style="position:absolute; right:50px; bottom:100px;">
<img src="images/ptr-ptr.png" style="position:absolute; right:50px; bottom:150px;">
---
# Arrays and Dynamic Memory
......@@ -663,7 +659,7 @@ In theory, one could use any mapping. In practice, two different mappings are st
<img src="images/row-wise-col-wise.png">
]]
For a two-dimensional array, e.g., a matrix, with dimensions `nrows x ncols`, the mappings are for index `(i,j)`:
For a two-dimensional array, e.g., a matrix, with dimensions `nrows x ncols`, the mappings for index `(i,j)` are:
- **row-wise:** `i * ncols + j`,
- **column-wise:** `j * nrows + i`.
......@@ -793,15 +789,15 @@ using DynamicMatrix
- Some basic vector methods
```c++
// vec_i = value
void fill (DynamicVector vec, double const value) {
// vec_i = alpha
void set (DynamicVector vec, double const alpha) {
for (int i = 0; i < vec.size(); ++i)
vec[i] = value;
vec[i] = alpha;
}
// vec = factor * vec
void scale (DynamicVector vec, double const factor) {
// vec = vec * alpha
void scale (DynamicVector vec, double const alpha) {
for (int i = 0; i < vec.size(); ++i)
vec[i] *= factor;
vec[i] *= alpha;
}
// y = alpha * x + y
void axpy (double const alpha, DynamicVector x, DynamicVector y) {
......@@ -839,12 +835,12 @@ double two_norm (DynamicVector vec) {
- Some basic matrix methods
```c++
// mat_ij = value
void fill (DynamicMatrix mat, double const value) {
fill(mat.span(), value); // use vector method
// mat_ij = alpha
void set (DynamicMatrix mat, double const alpha) {
set(mat.span(), alpha); // use vector method
}
// mat = mat * factor
void scale (DynamicMatrix mat, double const factor) { ... }
void scale (DynamicMatrix mat, double const alpha) { ... }
// y = alpha * x + y
void axpy (double const alpha, DynamicMatrix x, DynamicMatrix y) { ... }
......@@ -913,13 +909,13 @@ void mat_mat (double const alpha, DynamicMatrix A, DynamicMatrix B, DynamicMatri
## Application: Basic linear-algebra - `blas.cc`
```c++
int main () {
double* data_A = make_matrix(10, 10);
auto data_A = make_matrix(10, 10);
auto A = DynamicMatrix(data_A, 10, 10);
double* data_x = make_vector(10);
auto data_x = make_vector(10);
auto x = DynamicVector(data_x, 10);
double* data_y = make_vector(10);
auto data_y = make_vector(10);
auto y = DynamicVector(data_y, 10);
fill(A, 1.0); fill(x, 1.0); fill(y, 0.0);
......@@ -934,3 +930,32 @@ int main () {
}
```
---
# Arrays and Dynamic Memory
## Application: Basic linear-algebra
- The **BLAS** (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for
performing basic vector and matrix operations.
* The Level 1 BLAS perform scalar, vector and vector-vector operations,
* the Level 2 BLAS perform matrix-vector operations, and
* the Level 3 BLAS perform matrix-matrix operations.
See, e.g.,
- http://www.netlib.org/blas/
---
# Arrays and Dynamic Memory
## Application: Basic linear-algebra
- The **BLAS** (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for
performing basic vector and matrix operations.
- **PETSc**, pronounced PET-see (the S is silent), is a suite of data structures and routines for
the scalable (parallel) solution of scientific applications modeled by partial differential equations.
See, e.g.,
- https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecSet.html
- https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScale.html
- https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecAXPY.html
- https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecDot.html
......@@ -5,6 +5,7 @@
<meta charset="utf-8" />
<meta name="viewport"
content="user-scalable=no,initial-scale=1,maximum-scale=1,minimum-scale=1,width=device-width" />
<link rel="stylesheet" type="text/css" href="./css/{{ site.mode }}.css" />
<link rel="stylesheet" type="text/css" href="./css/pure-min.css" />
<link rel="stylesheet" type="text/css" href="./css/typo.css" />
<link rel="stylesheet" type="text/css" href="./css/spaces.css" />
......@@ -44,6 +45,7 @@ class: tud-light, typo
{% include functions.md %}
{% include git.md %}
{% include arrays-and-dynamic-memory.md %}
{% include advanced-datatypes.md %}
</textarea>
<div id="slideshow"></div>
......
......@@ -126,13 +126,13 @@ void mat_mat (double const alpha, DynamicMatrix A, DynamicMatrix B, DynamicMatri
int main ()
{
double* data_A = make_matrix(10, 10);
auto data_A = make_matrix(10, 10);
auto A = DynamicMatrix(data_A, 10, 10);
double* data_x = make_vector(10);