High Performance Cross Platform Architecture
High Performance Cross Platform Architecture
// square.h
class square : public shape {
int side_length;
public:
void draw() const override;
};
OCP: The Original Open–Closed Principle
• If code has been made public, changes should not affect existing code.
• Re-use is through direct inheritance, not delegated polymorphism.
• Has been unrealistic in C++
The principles stated that a good module structure should be both closed
and open:
• Closed, because clients need the module’s services to proceed with
their own development, and once they have settled on a version of the
module should not be affected by the introduction of new services they
do not need.
• Open, because there is no guarantee that we will include right from the
start every service potentially useful to some client.
plt/math plt/math
Quat.h Quat.h
Quat_Common.h Common/
Quat_Neon32.h Quat_Common.h
Quat_Sse.h Vec_Common.h
Quat_Sse2.h Mtx_Common.h
… …
Neon32/
Quat_Neon32.h
Vec_Neon32.h
Mtx_Neon32.h
…
Including Headers
• Each feature has its own header file to define the common definitions
• The header file is responsible for including the platform-specific code
• A series of preprocessor macros handles generating the header file name to
load
• Example:
INCLUDE_SIMD(Quat)
becomes:
”Quat_SSE2.h”
Header Inclusion Macros
• #define INCLUDE_PLT(Feature, File) INCLUDE_BUILD_FILENAME(Feature, File)
• #define INCLUDE_PLT_FEATURE(Feature) INCLUDE_STRINGIZE(Feature.h)
• #define INCLUDE_BUILD_FILENAME(Feature, File) INCLUDE_STRINGIZE(File ## _ ## Feature.h)
• #define INCLUDE_STRINGIZE(String) #String
#include INCLUDE_PLT_LOCAL(PLT_SIMD))
Simd_Common.h: The SIMD Feature Header
namespace plt::simd
{
struct Common {};
}
Quaternions, Mathematically Speaking (1/2)
• Four-dimensional complex number:
𝑤 + 𝑥𝐢 + 𝑦𝐣 + 𝑧𝐤
• Where:
2
𝐢 = 𝐣2 = 𝐤 2 = 𝐢𝐣𝐤 = −1
𝐢𝐣 = 𝐤 = −𝐣𝐢
𝐣𝐤 = 𝐢 = −𝐤𝐣
𝐤𝐢 = 𝐣 = −𝐢𝐤
• Addition:
𝑎 + 𝑏 = 𝑎𝑤 + 𝑏𝑤 + 𝑎𝑥 + 𝑏𝑥 𝐢 + 𝑎𝑦 + 𝑏𝑦 𝐣 + 𝑎𝑧 + 𝑏𝑧 𝐤
• Multiplication:
𝑎𝑏 = 𝑎𝑤 𝑏𝑤 − 𝑎𝑥 𝑏𝑥 − 𝑎𝑦 𝑏𝑦 − 𝑎𝑧 𝑏𝑧
+ 𝑎𝑤 𝑏𝑥 + 𝑎𝑥 𝑏𝑤 + 𝑎𝑦 𝑏𝑧 − 𝑎𝑧 𝑏𝑦 𝐢
+ 𝑎𝑤 𝑏𝑦 − 𝑎𝑥 𝑏𝑧 + 𝑎𝑦 𝑏𝑤 + 𝑎𝑧 𝑏𝑥 𝐣
+ 𝑎𝑤 𝑏𝑧 + 𝑎𝑥 𝑏𝑦 − 𝑎𝑦 𝑏𝑥 + 𝑎𝑧 𝑏𝑤 𝐤
Quaternions, Mathematically Speaking (2/2)
• Conjugate:
q∗ = 𝑤 − 𝑥𝐢 − 𝑦𝐣 − 𝑧𝐤
• Dot Product:
𝑎 ∙ 𝑏 = 𝑎𝑤 𝑏𝑤 + 𝑎𝑥 𝑏𝑥 + 𝑎𝑦 𝑏𝑦 + 𝑎𝑧 𝑏𝑧
• Norm:
𝑞 = 𝑞∙𝑞 = 𝑤2 + 𝑥 2 + 𝑦 2 + 𝑧 2
• Multiplicative∗ Inverse:
∗
−1 𝑞 𝑞
𝑞 = 2
=
𝑞 𝑞∙𝑞
• Division:
𝑎 𝑎𝑏∗
= 𝑎𝑏1 =
𝑏 𝑏∙𝑏
Quaternion Concepts in Mathematics vs. C++ 20
• Mathematically, a quaternion is defined by its data and operations
• As a C++ concept, the quaternion is defined purely by its data
• The common implementation of operations rely only upon the concepts
• Optimized implementations conform to the concepts and common
implementations
C++ Quaternion Concept
template<typename Q>
concept Quaternion = requires(Q q)
{
typename Q::Scalar;
Arithmetic<typename Q::Scalar>;
{ t + u };
{ t - u };
{ t * u };
{ t / u };
};
“Standard” Quaternion Type: Declaration and Data
template<typename S, typename I = plt::simd::PLT_SIMD>
class Quat
{
public:
using Scalar = S;
private:
Scalar w_, x_, y_, z_;
// ... Constructors
// ... Accessors
}
“Standard” Quaternion Type: Constructors
Quat() = default;
template<Quaternion Q>
requires std::convertible_to<typename Q::Scalar, Scalar>
Quat(const Q& rhs)
noexcept(std::is_nothrow_convertible_v<typename Q::Scalar, Scalar>)
: Quat(Scalar{rhs.w()}, Scalar{rhs.x()}, Scalar{rhs.y()}, Scalar{rhs.z()})
{}
“Standard” Quaternion Type: Assignment Operator
template<Quaternion Q>
requires std::convertible_to<typename Q::Scalar, Scalar>
Quat& operator=(const Q& rhs)
noexcept(std::is_nothrow_convertible_v<typename Q::Scalar, Scalar>)
{
w_ = Scalar{rhs.w()};
x_ = Scalar{rhs.x()};
y_ = Scalar{rhs.y()};
z_ = Scalar{rhs.z()};
return *this;
}
“Standard” Quaternion Type: Accessors
const Scalar & w() const { return w_; } noexcept
const Scalar & x() const { return x_; } noexcept
const Scalar & y() const { return y_; } noexcept
const Scalar & z() const { return z_; } noexcept
General Operation Implementation
• Use expression trees
• Runs anywhere
• Works for any arithmetic scalar type
• Parameters defined using concepts
What is an Expression Tree? An Example
• Expression: q1+q2*q3
• Operators construct nodes in the tree
• Expression results in a tree:
Addition Example: Operator
template<Quaternion QL, Quaternion QR>
inline auto operator+(const QL & lhs, const QR & rhs) noexcept -> QuaternionAddition<QL, QR>
{
return QuaternionAddition(lhs, rhs);
}
Addition Example: Quaternion Binary Expression
template<Quaternion QL, Quaternion QR>
requires MutuallyArithmetic<typename QL::Scalar, typename QR::Scalar>
class QuaternionBinaryExpr : public QuaternionExpr
{
using SL = typename QL::Scalar;
using SR = typename QR::Scalar;
public:
using Scalar = typename std::common_type<SL, SR>::type;
};
public:
using Scalar = QuaternionBinaryType<QL, QR>;
#include <type_traits>
#include ”simd.h"
namespace plt::simd
{
struct Neon32 : Common {};
template<typename SIMD>
concept Neon32Family = std::derived_from<SIMD, Neon32>;
}
Quat_Neon32.h: Quat<float, Neon> Class Declaration and Data
#include <arm_neon.h>
template<>
class Quat<float, plt::simd::Neon32>
{
float32x4_t value_;
public:
using Scalar = float;
// ... Constructors
// ... Accessors
};
Quat_Neon32.h: Quat<float, Neon> Constructors
Quat() = default;
template<Quaternion Q>
Quat(const Q& rhs)
: Quat(static_cast<Scalar>(rhs.w()), static_cast<Scalar>(rhs.x()),
static_cast<Scalar>(rhs.y()),
static_cast<Scalar>(rhs.z()))
{}
#include <type_traits>
#include ”simd.h"
namespace plt::simd
{
struct Sse : Common {};
template<typename SIMD>
concept SseFamily = std::derived_from<SIMD, Sse>;
}
Quat_Sse.h: Quat<float, Sse> Declaration and Constructors
#include <immintrin.h>
template<>
class Quat<float, plt::simd::Sse>
{
__m128 value_;
public:
using Scalar = float;
// ...
#include ”Sse.h"
namespace plt::simd
{
struct Sse2 : Sse {};
template<typename SIMD>
concept Sse2Family = SseFamily<SIMD> && std::derived_from<SIMD, Sse2>;
}
Quat_Sse2.h: Quat<float, Sse2> Class
#include "Quat_Sse.h"
template<>
class Quat<float, plt::simd::Sse2> : public Quat<float, plt::simd::Sse>
{
using Quat<float, plt::simd::Sse>::Quat;
};
Quat_Sse2.h: Quat<double, Sse2> Class Declaration and Data
template<>
class Quat<double, plt::simd::Sse2>
{
__m128d wx_;
__m128d yz_;
public:
using Scalar = double;
// ... Constructors
// ... Accessors
}
Quat_Sse2.h: Quat<double, Sse2> Class Constructors
Quat() = default;
template<Quaternion Q>
Quat(const Q& rhs)
: Quat(static_cast<Scalar>(rhs.w()), static_cast<Scalar>(rhs.x()), static_cast<Scalar>(rhs.y()),
static_cast<Scalar>(rhs.z()))
{}
template<plt::simd::Sse3 SIMD>
inline auto Dot(Quat<double, SIMD> lhs, Quat<double, SIMD> rhs) -> double
{
__m128d w2x2 = _mm_mul_pd(lhs.SseWx(), rhs.SseWx()); // w^2, x^2
__m128d y2z2 = _mm_mul_pd(lhs.SseYz(), rhs.SseYz()); // y^2, z^2
__m128d add1 = _mm_hadd_pd(w2x2, y2z2); // w^2+x^2, y^2+z^2
__m128d add2 = _mm_hadd_pd(add1, add1); // w^2+x^2+y^2+z^2, ...
template<typename SIMD>
• When Sse2Derived is true, concept Sse2Derived = std::derived_from<SIMD, Sse2>;
SseFamily is also true (but not
necessarily the reverse); however, it
does not subsume SseFamily
Feature Revision Without Subsumption
• The first multiplication function is template<typename SIMD>
concept Sse3Family = std::derived_from<SIMD, Sse3>;
defined in SSE
template<plt::simd::SseFamily SIMD>
• A more-optimized one is inline auto operator*(Quat<float, SIMD> lhs,
Quat<float, SIMD> rhs) -> Quat<float, SIMD>;
implemented in SSE3
template<plt::simd::Sse3Family SIMD>
• Assume Sse3Family is defined only inline auto operator*(Quat<float, SIMD> lhs,
Quat<float, SIMD> rhs) -> Quat<float, SIMD>;
as a tag derived from Sse3
• This will fail to compile with an
ambiguous overload error
• The valid parameter types of
Sse3Family is a strict subset of
those acceptable by SseFamily
• But the concepts are unrelated and
thus ambiguous
Feature Revision With Subsumption
• The Sse3Family concept refers to template<typename SIMD>
concept Sse2Family = SseFamily<SIMD> &&
Sse2Family and thus subsumes it std::derived_from<SIMD, Sse3>;
• https://github.com/noahstein/Ark