SlideShare a Scribd company logo
Koan-Sin Tan,
freedom@computer.org
COSCUP, Aug 2nd, 2020
TensorFlow Runtime
A Peek into the Future of TensorFlow
1
• disclaimer: opinions are my own

• feel free to interrupt me if you have any questions during the presentation

• questions could be Taiwanese, English, or Mandarin

• most of TFRT materials are adapted from TFRT deep dive in MLIR design meeting [1] and TFRT docs [2]

• code around Aug 1, 2020 (git commit ecf1c20 [3])

[1] TFRT Deep Dive,  slides - recording, https://mlir.llvm.org/talks/

[2] https://github.com/tensorflow/runtime/tree/master/documents

[3] https://github.com/tensorflow/runtime/commit/ecf1c20
2
• Used open source before the term “open
source” is used
• A software guy, learned to use Unix and open
source software on VAX-11/780 running 4.3BSD
• Used to be a programming language junkie
• Worked on various system software, e.g., CPU
scheduling and power management of non-
CPU components
• Recently, on NN performance on edge devices
related stuff
• Contributed from time to time to TensorFlow Lite
• started a command line label_image for TFLite
who i am
https://gunkies.org/w/images/c/c1/DEC-VAX-11-780.jpg
3
What is TFRT
• TensorFlow Runtime (TFRT) is one of the two new MLIR runtimes emerged in 2020 so far. 

• The other one is Intermediate Representation Execution Environment, IREE. It seems
so far tfrt has better design documentation

• Both of them have mobile / edge environment in mind. 

• I didn’t see mobile accelerated code in TFRT yet. 

• IREE has some Vulkan related code and some simple code works on Android already

• ResNet GPU inference is 28% faster with TFRT

• https://github.com/tensorflow/runtime, https://youtu.be/15tiQoPpuZ8
4
Build it
• if you follow the instructions described in README.md, it should just work. At least on x86_64 linux.

• however, it’s not tested for non Linux environment yet

• ssize_t and int64_t

• on Mac OS X: ssize_t: long, int64_t: long long
• current code mixed the use of ssize_t and int64_t

• test: one the acclaimed features of TFRT, like MLIR, is its use of 

LLVM FileCheck

• my hacks, shape related (ssize_t) tests not fixed yet

• it’s not tested on non-x86 platforms, such as aarch64, either 

•
5
• The three key directories under the TFRT root directory are

• lib: Contains core TFRT infrastructure code

• backends: Contains device specific infrastructure and op/kernel implementations

• include: Contains public header files for core TFRT infrastructure
6
Walking thru the tutorial
• unfortunately, it seems it’s not easy to jump directly into source code without having
some background knowledge

• so we’ll walk thru the tutorial [1]

• What are in the tutorial

• print hello world

• print integer

• adding kernels

[1] https://github.com/tensorflow/runtime/blob/master/documents/tutorial.md
7
using tfrt and tfrt_test
hello.mlir
func @hello() {
%chain = tfrt.new.chain
// Create a string containing "hello world" and store it in %hello.
%hello = "tfrt_test.get_string"() { string_attr = "hello world" } : () -> !tfrt.string
// Print the string in %hello.
"tfrt_test.print_string"(%hello, %chain) : (!tfrt.string, !tfrt.chain) -> !tfrt.chain
tfrt.return
}
The ‘@hello function above shows how to create and print a string. The text after each ‘:’ specifies the types involved:

• ()->!tfrt.string means that tfrt_test.get_string takes no arguments and returns a !tfrt.string. tfrt is a
MLIR dialect prefix (or namespace) for TFRT

• (!tfrt.string, !tfrt.chain) -> !tfrt.chain means that tfrt_test.print_string takes two arguments (!
tfrt.string and !tfrt.chain) and returns a !tfrt.chain. chain [1] is a TFRT abstraction to manage dependencies

[1] https://github.com/tensorflow/runtime/blob/master/documents/explicit_dependency.md
8
hello world in MLIR
func @stringconstant() -> !llvm<"[12 x i8]"> {
%1 = llvm.constant("Hello world!") : !llvm<"i8*">
// CHECK: ret [12 x i8] c"Hello world!"
llvm.return %1 : !llvm<"i8*">
}
func @main() {
%0 = llvm.constant(0) : !llvm.i64
%1 = call @stringconstant() : () -> !llvm<"[12 x i8]">
%2 = llvm.getelementptr %1[%0] : (!llvm<"[12 x i8]">, !llvm.i64) -> !llvm<"i8*">
%3 = llvm.bitcast %2 : !llvm<"i8*"> to !llvm<"i8*">
%32 = llvm.call @puts(%2) : (!llvm<"i8*">) -> !llvm.i32
return
}
func @puts(!llvm<"i8*">) -> !llvm.i32
• MLIR “standard dialect” doesn’t have I/O functions 

• there is LLVM dialect, of course we can use LLVM to call standard libc
function
9
Hello integer
func @hello_integers() {
%chain = tfrt.new.chain
// Create an integer containing 42.
%forty_two = tfrt.constant.i32 42
// Print 42.
tfrt.print.i32 %forty_two, %chain
tfrt.return
}
• as stated in the tutorial, we can run other functions in the same modular

• we can turn to more basic ones, such as integers or floating point numbers

• @hello_integers shows how to create and print integers

• This example does not have the verbose type information we saw in @hello because there are
custom parsers for the tfrt.constant.i32 and tfrt.print.32 kernels in
basic_kernels.td
10
basic_kernels.td
• .td (table description?) files are for LLVM TableGen

[1] TableGen, https://llvm.org/docs/TableGen/
class ConstantOp<string suffix, Type baseType, Attr attr>
: TFRT_Op<"constant." # suffix, [NoSideEffect]> {
let summary = "host executor constant value constructor";
let arguments = (ins attr:$value);
let results = (outs baseType);
}
class PrintOp<string suffix, Type type> : TFRT_Op<"print." # suffix> {
let summary = "tfrt.print operation";
let description = [{
An operation takes a number input and a chain input.
It prints the number to stdout and returns a chain output.
The chain input must be the second operand.
Example:
%2 = tfrt.print.i32 %0, %1
}];
let arguments = (ins type, TFRT_ChainType);
let results = (outs TFRT_ChainType);
let assemblyFormat = "operands attr-dict";
let verifier = ?;
}
https://github.com/tensorflow/runtime/blob/master/include/tfrt/basic_kernels/opdefs/basic_kernels.td#L376-L390
https://github.com/tensorflow/runtime/blob/master/include/tfrt/basic_kernels/opdefs/basic_kernels.td#L58-L64
11
Define kernels
12
user defined kernels
func @print_coordinate() {
%chain = tfrt.new.chain
%two = tfrt.constant.i32 2
%four = tfrt.constant.i32 4
%coordinate = "my.create_coordinate"(%two, %four) : (i32, i32) -> !my.coordinate
"my.print_coordinate"(%coordinate, %chain) : (!my.coordinate, !tfrt.chain) -> !tfrt.chain
tfrt.return
}
coordinate.mlir shows several TFRT features:

• MLIR types that begin with exclamation mark (!) are user-defined types like !my.coordinate,
compared to built-in types like i32

• Kernels are just C++ functions with a name in MLIR: my.print_coordinate is the MLIR name for
the C++ PrintCoordinate function

• Kernels may pass arbitrary user-defined types: my.create_coordinate passes a custom
Coordinate struct to my.print_coordinate 13
to dig into some code we need
more system information
14
Host Runtime
15
• TensorFlow user passes into TFRT a
TensorFlow graph created via high-level
TensorFlow APIs, and

• TFRT then calls the MLIR-based graph
compiler to optimize and lower the
graph into BEF, a Binary Executable
Format for TFRT graph execution (MLIR
is the compiler infrastructure that we
use to represent TFRT host programs). 

• The blue arrows in the simplified
TensorFlow training stack diagram
show this flow.
16
• In the README.md we are told to build two
binaries: tfrt_translate and bef_excutor

• tfrt_translate

• The tfrt_translate program does round trip
translation between MLIR and BEF, similar
to an assembler and disassembler.

• bef_executor

• The bef_executor program is the
execution driver of BEF files. It reads in a
BEF file, sets up runtime, and
asynchronously executes function(s) in
that file.
17
TFRT Host Runtime
• Foundation of TFRT: schedules work on the host and devices

• Clean separation between host and device runtimes:

• Host runtime does not know anything about devices, just their runtimes (sets of kernels) 

• Key design points:

• Fully asynchronous - kernel executions can not block

• Excellent error propagation in the presence of asynchrony

• Performance as a first-class concern, for graph and eager

• Outline:

• Common runtime infrastructure

• Graph execution

• Op-by-op execution (“eager”)
18
• Container for data or resources

• Not Tensor specific

• A “future” type, fulfilled with exactly one value, or an error

• Lock-free, low memory overhead, type erased, reference
counted	 

• Helper class AsyncValueRef<T> provides type safety when
contained type is known
• AsyncValues enable efficient asynchronous compute

• Asynchronous functions return unavailable AsyncValues
• Caller can schedule dependent
computations with AsyncValue::AndThen()
• Caller need not block until AsyncValue
becomes available
Key Abstraction: AsyncValue 

https://github.com/tensorflow/runtime/blob/master/include/tfrt/host_context/async_value.h
19
Kernels
• Kernel: unit of computation scheduled by the runtime

• Similar to kernel concept in current TensorFlow

• Kernels accept AsyncValue inputs and produce AsyncValue output

• Runtime coordinates dataflow of AsyncValues between kernels

• Outputs may not be immediately available, unlike current TensorFlow

• Runtime generally does not understand kernel semantics
//	Kernel	that	adds	two	integers.	
//	AsyncKernelFrame	holds	the	kernel’s	arguments	and	results.	
static	void	TFRTAdd(AsyncKernelFrame*	frame)	{	
		//	Fetch	the	kernel’s	0th	argument.	
		AsyncValue*	arg1	=	frame->GetArgAt(0);	
		//	Fetch	the	kernel’s	1st	argument.	
		AsyncValue*	arg2	=	frame->GetArgAt(1);	
		int	v1	=	arg1->get<int>();	
		int	v2	=	arg2->get<int>();	
		//	Set	the	kernel’s	0th	result.	
		frame->EmplaceResultAt<int>(0,	v1	+	v2);	
}	
https://github.com/tensorflow/runtime/blob/master/documents/tfrt_host_runtime_design.md
https://github.com/tensorflow/runtime/blob/master/lib/basic_kernels/integer_kernels.cc#L39-L45
https://github.com/tensorflow/runtime/blob/master/include/tfrt/host_context/kernel_utils.h#L61-L149
20
Host Program
• Host programs encode a dataflow graph

• Similar to GraphDef in current TensorFlow

• Expressed in MLIR. Typically compiler generated

• Designed for low-level dispatch efficiency

• Designed for compiler transformations and analysis, e.g., 

• Use dataflow analysis for buffer reuse
func @sample_function() -> i32 {
%one = tfrt.constant.i32 1 // Make AsyncValue with value 1
%two = tfrt.constant.i32 2 // Make AsyncValue with value 2
%three = tfrt.add.i32 %one, %two // Make AsyncValue with value 3 (1+2)
%ch0 = tfrt.new.chain
tfrt.print.i32 %three, %ch0 // Print AsyncValue %three
tfrt.return %three : i32 // Return AsyncValue %three
}
21
TFRT Binary Executable Format (BEF)
• BEF encodes a hardware-specific lowered graph
function

• Primary interface between compiler and runtime 

• Designed for efficient execution

• Low overhead: execute program by reading mmap’d
byte array 

• Persistent and stable: Compile once offline, run
many times 

online. Great for inference use-cases 

• Composed of sections, similar to ELF. Each section
has its own format 

• Extensible: BEF is versioned, reader ignores unknown
sections, new versions may define new sections 
 https://github.com/tensorflow/runtime/blob/master/documents/binary_executable_format.md
22
BEF Executor
• BEF Executor evaluates a BEF dataflow graph “executor” style:

• Not a bytecode-like interpreter: no concept of program counter

• “Strict” execution by default: run a kernel only when all its inputs are available

• Executor features:

• Lock-free: atomics instead of mutexes

• Non-blocking: defer dependent work with AsyncValue::AndThen

• Supports “non-strict” execution: may run a kernel when some of its
inputs are available

• Good for efficiently forwarding unavailable inputs to outputs

• Key concepts:

• BEF: dataflow graph

• Kernel: dataflow node

• AsyncValues: dataflow edge
https://github.com/tensorflow/runtime/blob/master/lib/bef_executor/bef_interpreter.cc#L223-L25423
Host Runtime Summary 

24
How about Core Runtime?
• Surely, we can do similar walkthrough, but that will takes more time

• Two things

• Op Execution API, Execute()

• BEF Executor can handle it too
void CoreRuntime::Impl::Execute(const ExecutionContext& exec_ctx,
string_view op_name, OpHandler* op_handler,
MutableArrayRef<TensorHandle> arguments,
const OpAttrsRef& attrs,
MutableArrayRef<TensorHandle> results,
AsyncValueRef<Chain>* chain) {
// Ask the op_handler to execute the op. If successful, we're done.
auto op_handle = op_handler->MakeOp(op_name);
if (op_handle) {
op_handle.get()(exec_ctx, arguments, attrs, results, chain);
return;
}
// Otherwise, we fail with an 'unknown op' error.
auto err =
EmitErrorAsync(exec_ctx, "op '" + op_name.str() + "' is not supported");
for (auto& result : results) result = TensorHandle(err.CopyRef());
if (chain) *chain = std::move(err);
}
25
https://github.com/tensorflow/runtime/blob/master/lib/core_runtime/core_runtime.cc#L124-L143
https://github.com/tensorflow/runtime/blob/master/documents/
tfrt_op_by_op_execution_design.md
BEF Executor for “op” graph
• corert.executeop

• sample
26
https://github.com/tensorflow/runtime/blob/master/lib/core_runtime/kernels.cc
func @example() -> !tfrt.chain {
%cpu = corert.get_op_handler("cpu")
// Create TensorHandles
%lhs = corert.executeop(%cpu)
"test.create_dense_tensor"() { shape = [1, 1], values = [-1.0 : f32] }
%rhs = corert.executeop(%cpu)
"test.create_dense_tensor"() { shape = [1, 1], values = [-2.0 : f32] }
%result = corert.executeop(%cpu) "test.add" (%lhs, %rhs)
%ch0 = tfrt.new.chain
%ch1 = corert.print_tensorhandle(%result, %ch0)
tfrt.return %ch1 : !tfrt.chain
}
func @example() -> !tfrt.chain {
%ch0 = tfrt.new.chain
%cpu = corert.get_op_handler %ch0 "cpu"
// Create TensorHandles
%lhs = corert.executeop(%cpu)
"test.create_dense_tensor"() { shape = [1, 1], values = [-1.0 : f32] } : 1
%rhs = corert.executeop(%cpu)
"test.create_dense_tensor"() { shape = [1, 1], values = [-2.0 : f32] } : 1
%result = corert.executeop(%cpu) "test.add" (%lhs, %rhs) : 1
%ch1 = "corert.print_tensorhandle"(%result, %ch0) : (!corert.tensorhandle, !tfrt.chain) -> !tfrt.chain
tfrt.return %ch1 : !tfrt.chain
}
Device Runtime
CPU
27
//===----------------------------------------------------------------------===//
// CPU Relu kernels
//===----------------------------------------------------------------------===//
// Computes B = Relu(A).
template <typename T>
static AsyncValueRef<Chain> Relu(const DenseHostTensor& A, DenseHostTensor* B,
const ExecutionContext& exec_ctx) {
auto fn = [](auto& a, auto& b) { return a.cwiseMax(static_cast<T>(0)); };
return ::tfrt::compat::UnaryEigenKernelAsync<T, T>(A, B, std::move(fn),
exec_ctx);
}
//===----------------------------------------------------------------------===//
// CPU BiasAdd kernels
//===----------------------------------------------------------------------===//
// A special case of tf.add where bias is restricted to be 1-D.
// Currently only support NHWC data format.
template <typename T, size_t RANK>
static AsyncValueRef<Chain> BiasAdd(const DenseHostTensor& input,
const DenseHostTensor& bias,
DenseHostTensor* output,
const ExecutionContext& exec_ctx) {
DHTIndexableView<T, RANK> input_view(&input);
MutableDHTIndexableView<T, RANK> output_view(output);
DHTIndexableView<T, 1> bias_view(&bias);
const auto& shape_input = input_view.FixedShape();
const auto& shape_bias = bias_view.FixedShape();
const auto& shape_output = output_view.FixedShape();
if (shape_input != shape_output) {
return EmitErrorAsync(exec_ctx, "unexpected output shape");
}
if (shape_bias[0] != shape_input[RANK - 1]) {
return EmitErrorAsync(exec_ctx, "bias shape does not match input shape");
}
// Reshape bias to the shape of input. Broadcast along the last axis of input.
Eigen::array<Eigen::Index, RANK> reshape_dims;
Eigen::array<Eigen::Index, RANK> broadcast_dims;
for (size_t i = 0; i < RANK - 1; ++i) {
reshape_dims[i] = static_cast<Eigen::Index>(1);
broadcast_dims[i] = static_cast<Eigen::Index>(shape_input[i]);
}
reshape_dims[RANK - 1] = static_cast<Eigen::Index>(shape_bias[0]);
broadcast_dims[RANK - 1] = static_cast<Eigen::Index>(1);
auto input_t = AsEigenConstTensor(input_view);
auto bias_t = AsEigenConstTensor(bias_view);
auto output_t = AsEigenTensor(output_view);
auto expr = input_t + bias_t.reshape(reshape_dims).broadcast(broadcast_dims);
return AsyncAssign(
exec_ctx.host()->GetOrCreateSharedContext<EigenHostContext>(),
std::move(output_t), std::move(expr),
KeepBuffers::alive(&input, &bias, output));
}
https://github.com/tensorflow/runtime/blob/master/backends/cpu/lib/kernels/cpu_kernels.h
Dialects we can see now
• tfrt: we know what this is for

• tfrt_test: to test tfrt

• tfrt_data: tf.data, to deal with input pipeline

• tfrt_dht: dense host tensor

• corert: Core Runtime, eager execution

• ts: tensor shape

• coo: COOrdinate list sparse tensor

• eigen: wrapper around the eigen library

• btf: binary tensor format

• cuda: you know what cuda means :-)
28
Concluding Remarks
• MLIR related talks and publications, https://mlir.llvm.org/talks/

• We scratched the surface of TFRT host runtime and core runtime. There are more details

• threading model: thread pool / work queue,

• memory allocation: tcmalloc for server, other small allocators for embedded systems,

• non-strict execution, and

• registers: BEF executor is a register machine

• we didn’t touch other important components such as device runtimes, eps. the GPU
part, and distributed environment
29
Fin
30
Device Runtime Design Principles 

• A thin wrapper of low-level (driver) APIs, exposing device capabilities to graph compiler

• Memory Allocation

• Async host <-> device transfer, and kernel execution

• Dependency management

• Focus on mechanism instead of policy

• E.g. No built-in special-purpose streams for GPU support:
• For pure eager execution, can default to one stream for everything 

• For tf.function execution, compiler can pick streams
31
Ad

More Related Content

What's hot (20)

The Rust Programming Language: an Overview
The Rust Programming Language: an OverviewThe Rust Programming Language: an Overview
The Rust Programming Language: an Overview
Roberto Casadei
 
Embedded Android : System Development - Part III (Audio / Video HAL)
Embedded Android : System Development - Part III (Audio / Video HAL)Embedded Android : System Development - Part III (Audio / Video HAL)
Embedded Android : System Development - Part III (Audio / Video HAL)
Emertxe Information Technologies Pvt Ltd
 
Coroutines for Kotlin Multiplatform in Practise
Coroutines for Kotlin Multiplatform in PractiseCoroutines for Kotlin Multiplatform in Practise
Coroutines for Kotlin Multiplatform in Practise
Christian Melchior
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
HostedbyConfluent
 
Running Spring Boot Applications as GraalVM Native Images
Running Spring Boot Applications as GraalVM Native ImagesRunning Spring Boot Applications as GraalVM Native Images
Running Spring Boot Applications as GraalVM Native Images
VMware Tanzu
 
Kotlin Coroutines and Android sitting in a tree
Kotlin Coroutines and Android sitting in a treeKotlin Coroutines and Android sitting in a tree
Kotlin Coroutines and Android sitting in a tree
Kai Koenig
 
The Nextcloud Roadmap for Secure Team Collaboration
The Nextcloud Roadmap for Secure Team CollaborationThe Nextcloud Roadmap for Secure Team Collaboration
The Nextcloud Roadmap for Secure Team Collaboration
Univention GmbH
 
Cgroups in android
Cgroups in androidCgroups in android
Cgroups in android
ramalinga prasad tadepalli
 
Git Version Control System
Git Version Control SystemGit Version Control System
Git Version Control System
KMS Technology
 
GIT presentation
GIT presentationGIT presentation
GIT presentation
Naim Latifi
 
Develop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsDevelop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM Boards
National Cheng Kung University
 
Feature Flags.pdf
Feature Flags.pdfFeature Flags.pdf
Feature Flags.pdf
Marc Hornbeek
 
Intro to kotlin
Intro to kotlinIntro to kotlin
Intro to kotlin
Tomislav Homan
 
Introduction To Git
Introduction To GitIntroduction To Git
Introduction To Git
Arnaud Seilles
 
GMock framework
GMock frameworkGMock framework
GMock framework
corehard_by
 
Rust vs C++
Rust vs C++Rust vs C++
Rust vs C++
corehard_by
 
Introduzione a Git (ITA - 2017)
Introduzione a Git (ITA - 2017)Introduzione a Git (ITA - 2017)
Introduzione a Git (ITA - 2017)
Valerio Radice
 
How to Use OWASP Security Logging
How to Use OWASP Security LoggingHow to Use OWASP Security Logging
How to Use OWASP Security Logging
Milton Smith
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Koan-Sin Tan
 
Git workflows
Git workflowsGit workflows
Git workflows
Thuc Le Dong
 
The Rust Programming Language: an Overview
The Rust Programming Language: an OverviewThe Rust Programming Language: an Overview
The Rust Programming Language: an Overview
Roberto Casadei
 
Coroutines for Kotlin Multiplatform in Practise
Coroutines for Kotlin Multiplatform in PractiseCoroutines for Kotlin Multiplatform in Practise
Coroutines for Kotlin Multiplatform in Practise
Christian Melchior
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
HostedbyConfluent
 
Running Spring Boot Applications as GraalVM Native Images
Running Spring Boot Applications as GraalVM Native ImagesRunning Spring Boot Applications as GraalVM Native Images
Running Spring Boot Applications as GraalVM Native Images
VMware Tanzu
 
Kotlin Coroutines and Android sitting in a tree
Kotlin Coroutines and Android sitting in a treeKotlin Coroutines and Android sitting in a tree
Kotlin Coroutines and Android sitting in a tree
Kai Koenig
 
The Nextcloud Roadmap for Secure Team Collaboration
The Nextcloud Roadmap for Secure Team CollaborationThe Nextcloud Roadmap for Secure Team Collaboration
The Nextcloud Roadmap for Secure Team Collaboration
Univention GmbH
 
Git Version Control System
Git Version Control SystemGit Version Control System
Git Version Control System
KMS Technology
 
GIT presentation
GIT presentationGIT presentation
GIT presentation
Naim Latifi
 
Develop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsDevelop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM Boards
National Cheng Kung University
 
Introduzione a Git (ITA - 2017)
Introduzione a Git (ITA - 2017)Introduzione a Git (ITA - 2017)
Introduzione a Git (ITA - 2017)
Valerio Radice
 
How to Use OWASP Security Logging
How to Use OWASP Security LoggingHow to Use OWASP Security Logging
How to Use OWASP Security Logging
Milton Smith
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Koan-Sin Tan
 

Similar to A Peek into TFRT (20)

A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlow
Koan-Sin Tan
 
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
OpenEBS
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Os lectures
Os lecturesOs lectures
Os lectures
Adnan Ghafoor
 
LEX lexical analyzer for compiler theory.ppt
LEX lexical analyzer for compiler theory.pptLEX lexical analyzer for compiler theory.ppt
LEX lexical analyzer for compiler theory.ppt
dralexpasion
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
Sasha Kravchuk
 
Threads
ThreadsThreads
Threads
Sameer Shaik
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
sean chen
 
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON
 
freertos-proj.pdf
freertos-proj.pdffreertos-proj.pdf
freertos-proj.pdf
AswathRangaraj1
 
Tensorflow internal
Tensorflow internalTensorflow internal
Tensorflow internal
Hyunghun Cho
 
2004 ugm-tips-tricks
2004 ugm-tips-tricks2004 ugm-tips-tricks
2004 ugm-tips-tricks
Shamoon Jamshed
 
Industry - Program analysis and verification - Type-preserving Heap Profiler ...
Industry - Program analysis and verification - Type-preserving Heap Profiler ...Industry - Program analysis and verification - Type-preserving Heap Profiler ...
Industry - Program analysis and verification - Type-preserving Heap Profiler ...
ICSM 2011
 
C++ Advanced Features
C++ Advanced FeaturesC++ Advanced Features
C++ Advanced Features
Michael Redlich
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite
Koan-Sin Tan
 
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Foundation
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin
敬倫 林
 
Standard Library Functions
Standard Library FunctionsStandard Library Functions
Standard Library Functions
Praveen M Jigajinni
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf Tools
Raj Pandey
 
A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlow
Koan-Sin Tan
 
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
OpenEBS
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
LEX lexical analyzer for compiler theory.ppt
LEX lexical analyzer for compiler theory.pptLEX lexical analyzer for compiler theory.ppt
LEX lexical analyzer for compiler theory.ppt
dralexpasion
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
Sasha Kravchuk
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
sean chen
 
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...
44CON
 
Tensorflow internal
Tensorflow internalTensorflow internal
Tensorflow internal
Hyunghun Cho
 
Industry - Program analysis and verification - Type-preserving Heap Profiler ...
Industry - Program analysis and verification - Type-preserving Heap Profiler ...Industry - Program analysis and verification - Type-preserving Heap Profiler ...
Industry - Program analysis and verification - Type-preserving Heap Profiler ...
ICSM 2011
 
Introduction to TensorFlow Lite
Introduction to TensorFlow Lite Introduction to TensorFlow Lite
Introduction to TensorFlow Lite
Koan-Sin Tan
 
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Foundation
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin
敬倫 林
 
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow
Flink Forward
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf Tools
Raj Pandey
 
Ad

More from Koan-Sin Tan (14)

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
Koan-Sin Tan
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
Koan-Sin Tan
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
Koan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Koan-Sin Tan
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPU
Koan-Sin Tan
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphones
Koan-Sin Tan
 
Caffe2 on Android
Caffe2 on AndroidCaffe2 on Android
Caffe2 on Android
Koan-Sin Tan
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on Android
Koan-Sin Tan
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
Koan-Sin Tan
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
Koan-Sin Tan
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
Koan-Sin Tan
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
Koan-Sin Tan
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Koan-Sin Tan
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
Koan-Sin Tan
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
Koan-Sin Tan
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
Koan-Sin Tan
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
Koan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Koan-Sin Tan
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPU
Koan-Sin Tan
 
open source nn frameworks on cellphones
open source nn frameworks on cellphonesopen source nn frameworks on cellphones
open source nn frameworks on cellphones
Koan-Sin Tan
 
Tensorflow on Android
Tensorflow on AndroidTensorflow on Android
Tensorflow on Android
Koan-Sin Tan
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
Koan-Sin Tan
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
Koan-Sin Tan
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
Koan-Sin Tan
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
Koan-Sin Tan
 
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsDark Silicon, Mobile Devices, and Possible Open-Source Solutions
Dark Silicon, Mobile Devices, and Possible Open-Source Solutions
Koan-Sin Tan
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
Koan-Sin Tan
 
Ad

Recently uploaded (20)

Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 

A Peek into TFRT

  • 1. Koan-Sin Tan, [email protected] COSCUP, Aug 2nd, 2020 TensorFlow Runtime A Peek into the Future of TensorFlow 1
  • 2. • disclaimer: opinions are my own • feel free to interrupt me if you have any questions during the presentation • questions could be Taiwanese, English, or Mandarin • most of TFRT materials are adapted from TFRT deep dive in MLIR design meeting [1] and TFRT docs [2] • code around Aug 1, 2020 (git commit ecf1c20 [3]) [1] TFRT Deep Dive,  slides - recording, https://mlir.llvm.org/talks/ [2] https://github.com/tensorflow/runtime/tree/master/documents [3] https://github.com/tensorflow/runtime/commit/ecf1c20 2
  • 3. • Used open source before the term “open source” is used • A software guy, learned to use Unix and open source software on VAX-11/780 running 4.3BSD • Used to be a programming language junkie • Worked on various system software, e.g., CPU scheduling and power management of non- CPU components • Recently, on NN performance on edge devices related stuff • Contributed from time to time to TensorFlow Lite • started a command line label_image for TFLite who i am https://gunkies.org/w/images/c/c1/DEC-VAX-11-780.jpg 3
  • 4. What is TFRT • TensorFlow Runtime (TFRT) is one of the two new MLIR runtimes emerged in 2020 so far. • The other one is Intermediate Representation Execution Environment, IREE. It seems so far tfrt has better design documentation • Both of them have mobile / edge environment in mind. • I didn’t see mobile accelerated code in TFRT yet. • IREE has some Vulkan related code and some simple code works on Android already • ResNet GPU inference is 28% faster with TFRT • https://github.com/tensorflow/runtime, https://youtu.be/15tiQoPpuZ8 4
  • 5. Build it • if you follow the instructions described in README.md, it should just work. At least on x86_64 linux. • however, it’s not tested for non Linux environment yet • ssize_t and int64_t • on Mac OS X: ssize_t: long, int64_t: long long • current code mixed the use of ssize_t and int64_t • test: one the acclaimed features of TFRT, like MLIR, is its use of 
 LLVM FileCheck • my hacks, shape related (ssize_t) tests not fixed yet • it’s not tested on non-x86 platforms, such as aarch64, either 
 • 5
  • 6. • The three key directories under the TFRT root directory are • lib: Contains core TFRT infrastructure code • backends: Contains device specific infrastructure and op/kernel implementations • include: Contains public header files for core TFRT infrastructure 6
  • 7. Walking thru the tutorial • unfortunately, it seems it’s not easy to jump directly into source code without having some background knowledge • so we’ll walk thru the tutorial [1] • What are in the tutorial • print hello world • print integer • adding kernels [1] https://github.com/tensorflow/runtime/blob/master/documents/tutorial.md 7
  • 8. using tfrt and tfrt_test hello.mlir func @hello() { %chain = tfrt.new.chain // Create a string containing "hello world" and store it in %hello. %hello = "tfrt_test.get_string"() { string_attr = "hello world" } : () -> !tfrt.string // Print the string in %hello. "tfrt_test.print_string"(%hello, %chain) : (!tfrt.string, !tfrt.chain) -> !tfrt.chain tfrt.return } The ‘@hello function above shows how to create and print a string. The text after each ‘:’ specifies the types involved: • ()->!tfrt.string means that tfrt_test.get_string takes no arguments and returns a !tfrt.string. tfrt is a MLIR dialect prefix (or namespace) for TFRT • (!tfrt.string, !tfrt.chain) -> !tfrt.chain means that tfrt_test.print_string takes two arguments (! tfrt.string and !tfrt.chain) and returns a !tfrt.chain. chain [1] is a TFRT abstraction to manage dependencies [1] https://github.com/tensorflow/runtime/blob/master/documents/explicit_dependency.md 8
  • 9. hello world in MLIR func @stringconstant() -> !llvm<"[12 x i8]"> { %1 = llvm.constant("Hello world!") : !llvm<"i8*"> // CHECK: ret [12 x i8] c"Hello world!" llvm.return %1 : !llvm<"i8*"> } func @main() { %0 = llvm.constant(0) : !llvm.i64 %1 = call @stringconstant() : () -> !llvm<"[12 x i8]"> %2 = llvm.getelementptr %1[%0] : (!llvm<"[12 x i8]">, !llvm.i64) -> !llvm<"i8*"> %3 = llvm.bitcast %2 : !llvm<"i8*"> to !llvm<"i8*"> %32 = llvm.call @puts(%2) : (!llvm<"i8*">) -> !llvm.i32 return } func @puts(!llvm<"i8*">) -> !llvm.i32 • MLIR “standard dialect” doesn’t have I/O functions • there is LLVM dialect, of course we can use LLVM to call standard libc function 9
  • 10. Hello integer func @hello_integers() { %chain = tfrt.new.chain // Create an integer containing 42. %forty_two = tfrt.constant.i32 42 // Print 42. tfrt.print.i32 %forty_two, %chain tfrt.return } • as stated in the tutorial, we can run other functions in the same modular • we can turn to more basic ones, such as integers or floating point numbers • @hello_integers shows how to create and print integers • This example does not have the verbose type information we saw in @hello because there are custom parsers for the tfrt.constant.i32 and tfrt.print.32 kernels in basic_kernels.td 10
  • 11. basic_kernels.td • .td (table description?) files are for LLVM TableGen [1] TableGen, https://llvm.org/docs/TableGen/ class ConstantOp<string suffix, Type baseType, Attr attr> : TFRT_Op<"constant." # suffix, [NoSideEffect]> { let summary = "host executor constant value constructor"; let arguments = (ins attr:$value); let results = (outs baseType); } class PrintOp<string suffix, Type type> : TFRT_Op<"print." # suffix> { let summary = "tfrt.print operation"; let description = [{ An operation takes a number input and a chain input. It prints the number to stdout and returns a chain output. The chain input must be the second operand. Example: %2 = tfrt.print.i32 %0, %1 }]; let arguments = (ins type, TFRT_ChainType); let results = (outs TFRT_ChainType); let assemblyFormat = "operands attr-dict"; let verifier = ?; } https://github.com/tensorflow/runtime/blob/master/include/tfrt/basic_kernels/opdefs/basic_kernels.td#L376-L390 https://github.com/tensorflow/runtime/blob/master/include/tfrt/basic_kernels/opdefs/basic_kernels.td#L58-L64 11
  • 13. user defined kernels func @print_coordinate() { %chain = tfrt.new.chain %two = tfrt.constant.i32 2 %four = tfrt.constant.i32 4 %coordinate = "my.create_coordinate"(%two, %four) : (i32, i32) -> !my.coordinate "my.print_coordinate"(%coordinate, %chain) : (!my.coordinate, !tfrt.chain) -> !tfrt.chain tfrt.return } coordinate.mlir shows several TFRT features: • MLIR types that begin with exclamation mark (!) are user-defined types like !my.coordinate, compared to built-in types like i32 • Kernels are just C++ functions with a name in MLIR: my.print_coordinate is the MLIR name for the C++ PrintCoordinate function • Kernels may pass arbitrary user-defined types: my.create_coordinate passes a custom Coordinate struct to my.print_coordinate 13
  • 14. to dig into some code we need more system information 14
  • 16. • TensorFlow user passes into TFRT a TensorFlow graph created via high-level TensorFlow APIs, and • TFRT then calls the MLIR-based graph compiler to optimize and lower the graph into BEF, a Binary Executable Format for TFRT graph execution (MLIR is the compiler infrastructure that we use to represent TFRT host programs). • The blue arrows in the simplified TensorFlow training stack diagram show this flow. 16
  • 17. • In the README.md we are told to build two binaries: tfrt_translate and bef_excutor • tfrt_translate • The tfrt_translate program does round trip translation between MLIR and BEF, similar to an assembler and disassembler. • bef_executor • The bef_executor program is the execution driver of BEF files. It reads in a BEF file, sets up runtime, and asynchronously executes function(s) in that file. 17
  • 18. TFRT Host Runtime • Foundation of TFRT: schedules work on the host and devices • Clean separation between host and device runtimes: • Host runtime does not know anything about devices, just their runtimes (sets of kernels) • Key design points: • Fully asynchronous - kernel executions can not block • Excellent error propagation in the presence of asynchrony • Performance as a first-class concern, for graph and eager • Outline: • Common runtime infrastructure • Graph execution • Op-by-op execution (“eager”) 18
  • 19. • Container for data or resources • Not Tensor specific • A “future” type, fulfilled with exactly one value, or an error • Lock-free, low memory overhead, type erased, reference counted • Helper class AsyncValueRef<T> provides type safety when contained type is known • AsyncValues enable efficient asynchronous compute • Asynchronous functions return unavailable AsyncValues • Caller can schedule dependent computations with AsyncValue::AndThen() • Caller need not block until AsyncValue becomes available Key Abstraction: AsyncValue https://github.com/tensorflow/runtime/blob/master/include/tfrt/host_context/async_value.h 19
  • 20. Kernels • Kernel: unit of computation scheduled by the runtime • Similar to kernel concept in current TensorFlow • Kernels accept AsyncValue inputs and produce AsyncValue output • Runtime coordinates dataflow of AsyncValues between kernels • Outputs may not be immediately available, unlike current TensorFlow • Runtime generally does not understand kernel semantics // Kernel that adds two integers. // AsyncKernelFrame holds the kernel’s arguments and results. static void TFRTAdd(AsyncKernelFrame* frame) { // Fetch the kernel’s 0th argument. AsyncValue* arg1 = frame->GetArgAt(0); // Fetch the kernel’s 1st argument. AsyncValue* arg2 = frame->GetArgAt(1); int v1 = arg1->get<int>(); int v2 = arg2->get<int>(); // Set the kernel’s 0th result. frame->EmplaceResultAt<int>(0, v1 + v2); } https://github.com/tensorflow/runtime/blob/master/documents/tfrt_host_runtime_design.md https://github.com/tensorflow/runtime/blob/master/lib/basic_kernels/integer_kernels.cc#L39-L45 https://github.com/tensorflow/runtime/blob/master/include/tfrt/host_context/kernel_utils.h#L61-L149 20
  • 21. Host Program • Host programs encode a dataflow graph • Similar to GraphDef in current TensorFlow • Expressed in MLIR. Typically compiler generated • Designed for low-level dispatch efficiency • Designed for compiler transformations and analysis, e.g., • Use dataflow analysis for buffer reuse func @sample_function() -> i32 { %one = tfrt.constant.i32 1 // Make AsyncValue with value 1 %two = tfrt.constant.i32 2 // Make AsyncValue with value 2 %three = tfrt.add.i32 %one, %two // Make AsyncValue with value 3 (1+2) %ch0 = tfrt.new.chain tfrt.print.i32 %three, %ch0 // Print AsyncValue %three tfrt.return %three : i32 // Return AsyncValue %three } 21
  • 22. TFRT Binary Executable Format (BEF) • BEF encodes a hardware-specific lowered graph function • Primary interface between compiler and runtime 
 • Designed for efficient execution • Low overhead: execute program by reading mmap’d byte array 
 • Persistent and stable: Compile once offline, run many times 
 online. Great for inference use-cases 
 • Composed of sections, similar to ELF. Each section has its own format 
 • Extensible: BEF is versioned, reader ignores unknown sections, new versions may define new sections 
 https://github.com/tensorflow/runtime/blob/master/documents/binary_executable_format.md 22
  • 23. BEF Executor • BEF Executor evaluates a BEF dataflow graph “executor” style: • Not a bytecode-like interpreter: no concept of program counter • “Strict” execution by default: run a kernel only when all its inputs are available • Executor features: • Lock-free: atomics instead of mutexes • Non-blocking: defer dependent work with AsyncValue::AndThen • Supports “non-strict” execution: may run a kernel when some of its inputs are available • Good for efficiently forwarding unavailable inputs to outputs • Key concepts: • BEF: dataflow graph • Kernel: dataflow node • AsyncValues: dataflow edge https://github.com/tensorflow/runtime/blob/master/lib/bef_executor/bef_interpreter.cc#L223-L25423
  • 25. How about Core Runtime? • Surely, we can do similar walkthrough, but that will takes more time • Two things • Op Execution API, Execute() • BEF Executor can handle it too void CoreRuntime::Impl::Execute(const ExecutionContext& exec_ctx, string_view op_name, OpHandler* op_handler, MutableArrayRef<TensorHandle> arguments, const OpAttrsRef& attrs, MutableArrayRef<TensorHandle> results, AsyncValueRef<Chain>* chain) { // Ask the op_handler to execute the op. If successful, we're done. auto op_handle = op_handler->MakeOp(op_name); if (op_handle) { op_handle.get()(exec_ctx, arguments, attrs, results, chain); return; } // Otherwise, we fail with an 'unknown op' error. auto err = EmitErrorAsync(exec_ctx, "op '" + op_name.str() + "' is not supported"); for (auto& result : results) result = TensorHandle(err.CopyRef()); if (chain) *chain = std::move(err); } 25 https://github.com/tensorflow/runtime/blob/master/lib/core_runtime/core_runtime.cc#L124-L143 https://github.com/tensorflow/runtime/blob/master/documents/ tfrt_op_by_op_execution_design.md
  • 26. BEF Executor for “op” graph • corert.executeop • sample 26 https://github.com/tensorflow/runtime/blob/master/lib/core_runtime/kernels.cc func @example() -> !tfrt.chain { %cpu = corert.get_op_handler("cpu") // Create TensorHandles %lhs = corert.executeop(%cpu) "test.create_dense_tensor"() { shape = [1, 1], values = [-1.0 : f32] } %rhs = corert.executeop(%cpu) "test.create_dense_tensor"() { shape = [1, 1], values = [-2.0 : f32] } %result = corert.executeop(%cpu) "test.add" (%lhs, %rhs) %ch0 = tfrt.new.chain %ch1 = corert.print_tensorhandle(%result, %ch0) tfrt.return %ch1 : !tfrt.chain } func @example() -> !tfrt.chain { %ch0 = tfrt.new.chain %cpu = corert.get_op_handler %ch0 "cpu" // Create TensorHandles %lhs = corert.executeop(%cpu) "test.create_dense_tensor"() { shape = [1, 1], values = [-1.0 : f32] } : 1 %rhs = corert.executeop(%cpu) "test.create_dense_tensor"() { shape = [1, 1], values = [-2.0 : f32] } : 1 %result = corert.executeop(%cpu) "test.add" (%lhs, %rhs) : 1 %ch1 = "corert.print_tensorhandle"(%result, %ch0) : (!corert.tensorhandle, !tfrt.chain) -> !tfrt.chain tfrt.return %ch1 : !tfrt.chain }
  • 27. Device Runtime CPU 27 //===----------------------------------------------------------------------===// // CPU Relu kernels //===----------------------------------------------------------------------===// // Computes B = Relu(A). template <typename T> static AsyncValueRef<Chain> Relu(const DenseHostTensor& A, DenseHostTensor* B, const ExecutionContext& exec_ctx) { auto fn = [](auto& a, auto& b) { return a.cwiseMax(static_cast<T>(0)); }; return ::tfrt::compat::UnaryEigenKernelAsync<T, T>(A, B, std::move(fn), exec_ctx); } //===----------------------------------------------------------------------===// // CPU BiasAdd kernels //===----------------------------------------------------------------------===// // A special case of tf.add where bias is restricted to be 1-D. // Currently only support NHWC data format. template <typename T, size_t RANK> static AsyncValueRef<Chain> BiasAdd(const DenseHostTensor& input, const DenseHostTensor& bias, DenseHostTensor* output, const ExecutionContext& exec_ctx) { DHTIndexableView<T, RANK> input_view(&input); MutableDHTIndexableView<T, RANK> output_view(output); DHTIndexableView<T, 1> bias_view(&bias); const auto& shape_input = input_view.FixedShape(); const auto& shape_bias = bias_view.FixedShape(); const auto& shape_output = output_view.FixedShape(); if (shape_input != shape_output) { return EmitErrorAsync(exec_ctx, "unexpected output shape"); } if (shape_bias[0] != shape_input[RANK - 1]) { return EmitErrorAsync(exec_ctx, "bias shape does not match input shape"); } // Reshape bias to the shape of input. Broadcast along the last axis of input. Eigen::array<Eigen::Index, RANK> reshape_dims; Eigen::array<Eigen::Index, RANK> broadcast_dims; for (size_t i = 0; i < RANK - 1; ++i) { reshape_dims[i] = static_cast<Eigen::Index>(1); broadcast_dims[i] = static_cast<Eigen::Index>(shape_input[i]); } reshape_dims[RANK - 1] = static_cast<Eigen::Index>(shape_bias[0]); broadcast_dims[RANK - 1] = static_cast<Eigen::Index>(1); auto input_t = AsEigenConstTensor(input_view); auto bias_t = AsEigenConstTensor(bias_view); auto output_t = AsEigenTensor(output_view); auto expr = input_t + bias_t.reshape(reshape_dims).broadcast(broadcast_dims); return AsyncAssign( exec_ctx.host()->GetOrCreateSharedContext<EigenHostContext>(), std::move(output_t), std::move(expr), KeepBuffers::alive(&input, &bias, output)); } https://github.com/tensorflow/runtime/blob/master/backends/cpu/lib/kernels/cpu_kernels.h
  • 28. Dialects we can see now • tfrt: we know what this is for • tfrt_test: to test tfrt • tfrt_data: tf.data, to deal with input pipeline • tfrt_dht: dense host tensor • corert: Core Runtime, eager execution • ts: tensor shape • coo: COOrdinate list sparse tensor • eigen: wrapper around the eigen library • btf: binary tensor format • cuda: you know what cuda means :-) 28
  • 29. Concluding Remarks • MLIR related talks and publications, https://mlir.llvm.org/talks/ • We scratched the surface of TFRT host runtime and core runtime. There are more details • threading model: thread pool / work queue, • memory allocation: tcmalloc for server, other small allocators for embedded systems, • non-strict execution, and • registers: BEF executor is a register machine • we didn’t touch other important components such as device runtimes, eps. the GPU part, and distributed environment 29
  • 31. Device Runtime Design Principles • A thin wrapper of low-level (driver) APIs, exposing device capabilities to graph compiler • Memory Allocation • Async host <-> device transfer, and kernel execution • Dependency management • Focus on mechanism instead of policy • E.g. No built-in special-purpose streams for GPU support: • For pure eager execution, can default to one stream for everything • For tf.function execution, compiler can pick streams 31