[Libomptarget] Allow the device runtime to be compiled for the host

Currently the OpenMP offloading device runtime is only expected to be
compiled for the specific architecture it's targeting. This is
problematic if we want to make compiling the device runtime more general
via the standar `clang` driver rather than invoking the clang front-end
directly. This patch addresses this by primarily changing the declare
type to `nohost` so the host will not contain any of this code.
Additionally we forward declare the functions that are defined via
variants, otherwise these would cause problems on the host.

Reviewed By: jdoerfert, tianshilei1992

Differential Revision: https://reviews.llvm.org/D125260
diff --git a/openmp/libomptarget/DeviceRTL/src/Mapping.cpp b/openmp/libomptarget/DeviceRTL/src/Mapping.cpp
index 21104be..48ca13a 100644
--- a/openmp/libomptarget/DeviceRTL/src/Mapping.cpp
+++ b/openmp/libomptarget/DeviceRTL/src/Mapping.cpp
@@ -15,7 +15,7 @@
 #include "Types.h"
 #include "Utils.h"
 
-#pragma omp declare target
+#pragma omp begin declare target device_type(nohost)
 
 #include "llvm/Frontend/OpenMP/OMPGridValues.h"
 
@@ -24,6 +24,23 @@
 namespace _OMP {
 namespace impl {
 
+// Forward declarations defined to be defined for AMDGCN and NVPTX.
+const llvm::omp::GV &getGridValue();
+uint32_t getGridDim(uint32_t n, uint16_t d);
+uint32_t getWorkgroupDim(uint32_t group_id, uint32_t grid_size,
+                         uint16_t group_size);
+uint32_t getNumHardwareThreadsInBlock();
+LaneMaskTy activemask();
+LaneMaskTy lanemaskLT();
+LaneMaskTy lanemaskGT();
+uint32_t getThreadIdInWarp();
+uint32_t getThreadIdInBlock();
+uint32_t getKernelSize();
+uint32_t getBlockId();
+uint32_t getNumberOfBlocks();
+uint32_t getWarpId();
+uint32_t getNumberOfWarpsInBlock();
+
 /// AMDGCN Implementation
 ///
 ///{