This introduces the linaro OP-TEE project in the context of the Automotive Grade Linux distribution. This TEE is today considered as a potential key element to provides some security enforcement in the scope of Software OTA for the AGL distribution.
This brief slides set was presented during AGL Face to Face Technical Meeting 25 – 27 May, Vannes, France
Rust is a multi-paradigm systems programming language focused on safety, especially safe concurrency. It was created by Mozilla in 2006 and released in 2015. Rust aims for speed, concurrency, and safety through eliminating garbage collection and ensuring memory safety and thread safety via its ownership and borrowing system. While syntactically similar to C++, Rust puts an emphasis on writing safe code and preventing common bugs like buffer overflows.
This document discusses Jenkins Pipelines, which allow defining continuous integration and delivery (CI/CD) pipelines as code. Key points:
- Pipelines are defined using a Groovy domain-specific language (DSL) for stages, steps, and environment configuration.
- This provides configuration as code that is version controlled and reusable across projects.
- Jenkins plugins support running builds and tests in parallel across Docker containers.
- Notifications can be sent to services like Slack on failure.
- The Blue Ocean UI in Jenkins focuses on visualization of pipeline runs.
The document discusses the Disruptor, a data structure and work flow that allows for high-performance concurrent programming with no contention. The Disruptor uses a ring buffer to pass messages between threads very quickly in a parallel manner. Publishers can insert events into the ring buffer, while batch event processors can read the events in batches to process them in parallel threads. The Disruptor framework encourages modeling the problem domain and provides reliable ordering, parallelism, and high performance.
This document provides an overview of Jenkins, an open-source tool for continuous integration and continuous delivery. It discusses key Jenkins concepts like architecture, pipelines, and shared libraries. Jenkins allows integrating multiple stages of development through continuous integration and delivery. It has a master-slave architecture and supports defining automated build processes through pipelines implemented as code.
TensorFlow is the most popular machine learning framework nowadays. TensorFlow Lite (TFLite), open sourced in late 2017, is TensorFlow’s runtime designed for mobile devices, esp. Android cell phones. TFLite is getting more and more mature. One the most interesting new components introduced recently are its GPU delegate and new NNAPI delegate. The GPU delegate uses Open GL ES compute shader on Android platforms and Metal shade on iOS devices. The original NNAPI delegate is an all-or-nothing design (if one of the ops in the compute graph is not supported by NNAPI, the whole graph is not delegated). The new one is a per-op design. When an op in a graph is not supported by NNAPI, the op is automatically fell back to the CPU runtime. I’ll have a quick review TFLite and its interpreter, then walk the audience through example usage of the two delegates and important source code of them.
This document introduces Rust and provides an overview of its key concepts. It begins with an introduction to the presenter and agenda. It then covers basic terminology, common system programming errors, why Rust was created, how to install Rust, data types including primitive types, complex data structures, ownership and borrowing rules, lifetimes, and how to get involved in the Rust community. Key concepts discussed include Rust's type system, memory safety features, and package manager.
Brief overview of the Rust system programming language. Provides a concise introduction of its basic features, with an emphasis on its memory safety features (ownership, moves, borrowing) and programming style with generic functions, structures, and traits.
Android Audio HAL – Audio Architecture – Audio HAL interface – Audio Policy – Audio HAL compilation & verification – Overview of Tinyalsa
Android Video HAL – Camera Architecture – Overview of camera HAL interface – Overview of V4L2 – Enabling V4l2 in kernel – Camera HAL compilation and verification
Droidcon Berlin 2021 - With coroutines being the de facto way of exposing async work and streams of changes for Kotlin on Android, developers are obviously attempting to use the same approaches when moving their code to Multiplatform.
But due to the way the memory model differs between JVM and Kotlin Native, it can be a painful experience.
In this talk, we will take a deep dive into the Coroutine API for Kotlin Multiplatform. You will learn how to expose your API with Coroutines while working with the Kotlin Native memory model instead of against it, and avoid the dragons along the way.
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentHostedbyConfluent
Currently, Apache Kafka® uses Apache ZooKeeper™ to store its metadata. Data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a separate ZooKeeper cluster. In 2019, we outlined a plan to break this dependency and bring metadata management into Kafka itself through a dynamic service that runs inside the Kafka Cluster. We call this the Quorum Controller.
In this talk, we’ll look at how the Quorum Controller works and how it integrates with other parts of the next-generation Kafka architecture, such as the Raft quorum and snapshotting mechanism. We’ll also explain how the Quorum Controller will simplify operations, improve security, and enhance scalability and performance.
Finally, we’ll look at some of the practicalities, such as how to monitor and run the Quorum Controller yourself. We’ll talk about some of the performance gains we’ve seen, and our plans for the future.
Kotlin Coroutines and Android sitting in a treeKai Koenig
Since the release of Kotlin 1.1 there is now the new language feature of Kotlin Coroutines available for use in Java and Android projects. Coroutines are a new way to write asynchronous and non-blocking code. They can be thought of as light-weight threads without having to deal with all the problems that threads bring to the table.
A lot of developers think that Kotlin Coroutines are mainly or only useful for Kotlin on the JVM, but that’s not true. There are a variety of use cases in which the application of Coroutines can make a lot of sense on Android.
This talk is introducing the ideas behind Kotlin Coroutines, showing how to use them in Kotlin code for both the JVM and Android via the kotlinx-coroutines APIs and then exploring specific applications in Android. Part of this is a deeper look into the use of Coroutines in higher-level frameworks such as AsyncAwait and Anko.
The Nextcloud Roadmap for Secure Team CollaborationUnivention GmbH
Nextcloud is an open source content collaboration platform that provides file sync and sharing, file server capabilities, and groupware functionality as an alternative to proprietary services like Dropbox, Google Suite, and Office 365. It allows for decentralization of data storage across trusted servers using open cloud mesh federation with end-to-end encryption and optional key recovery. Nextcloud supports collaboration across iOS, Android, Mac, Windows, Linux, and through CalDAV/CardDAV integration with email clients and Outlook/Thunderbird plugins.
Android uses cgroups to monitor system memory usage via the Low Memory Killer daemon and to group processes for effective CPU sharing. Cgroups are used to create mount points for memory and CPU control groups. The LMK daemon uses cgroups to receive memory pressure events and kill processes as needed. Init.rc uses cgroups to create groups for real-time and background tasks and assign CPU shares. Android further groups processes by scheduling policy for scheduling priorities.
GIT is a free and open source distributed version control system that allows developers to work collaboratively without needing centralized connectivity. It provides powerful branching capabilities that allow creating branches cheaply and merging them easily. Common GIT commands include init, clone, status, add, commit, log, remote, fetch, push, and pull. An example scenario demonstrates how multiple developers can clone a remote repository, make changes on their local repos, fetch and push changes between local and remote repos, and merge branches.
* Know the reasons why various operating systems exist and how they are functioned for dedicated purposes
* Understand the basic concepts while building system software from scratch
• How can we benefit from cheap ARM boards and the related open source tools?
- Raspberry Pi & STM32F4-Discovery
This is a brief description of feature flags (also know as feature toggles or feature switches) used in combination with version management and branching strategies to streamline and optimize CICD pipelines.
If you still haven't heard of it, there is a new star in JVM sky - Kotlin. This short presentation will serve as intro for those who wan't to hear what's all the fuss about and dive deeper into this new alternative to Java
Git is a distributed version control system that was created by Linus Torvalds as an improvement over centralized systems like Subversion; it works by tracking changes to files and committing snapshots of changes locally or to a remote server, and has a flexible branching workflow that allows users to work independently and merge changes together. The document provides an introduction to basic Git concepts, commands, and workflows for versioning code and collaborating through branching and merging changes.
Использование GMock для обеспечения спокойной и сытой жизни разработчика. Обзор как верхушки так и некоторых подводных частей GMock. Разбор возможностей фреймворка на примерах.
Rust and C++ are both systems programming languages but Rust provides better memory safety while maintaining performance. Rust uses a borrow checker to catch errors at compile time and disallows null references, while C++ relies more on programmer discipline. Rust also guarantees thread safety by preventing data races through its ownership and borrowing model.
Breve introduzione a GIT:
. Iniziare un nuovo progetto o clonarne uno esistente
. primi commit e comandi base
. esempi di utilizzo
Autore: Valerio Radice
tag line:
Tutorial GIT ITA italiano
OWASP Security Logging API easily extends your current log4j and logback logging with impressive features helpful for security, diagnostics/forensics, and compliance. Slide deck presentation from OWASP AppSecEU 2016 in Rome.
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Koan-Sin Tan
The document discusses using Neural Engine on A11 and A12 devices. It provides log outputs showing Neural Engine (ANE) being used on an iPhone Xs Max and not being used on an iPhone 8 Plus and iPhone 6s, which have A11 and earlier chips. It also shares code for checking the compute units and provides links to example projects for using Neural Engine on Core ML models.
The document discusses Git workflows, comparing centralized and feature branch workflows. It then describes Vincent Driessen's branching model which uses two main branches (master and develop) and three supporting branch types (feature, release, hotfix). The master branch is stable and used for production, while develop is integrated features. Feature branches branch off develop for new work, and release branches prepare releases by merging to develop and master. Hotfix branches fix production issues. The model aims to support collaboration while keeping branches stable. Special cases in applying the model are also addressed.
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017OpenEBS
The slides were presented by Jeffry Molanus who is the CTO of OpenEBS in Golang Meetup. OpenEBS is an open source cloud native storage. OpenEBS delivers storage and storage services to containerized environments. OpenEBS allows stateful workloads to be managed more like stateless containers. OpenEBS storage services include: per container (or pod) QoS SLAs, tiering and replica policies across AZs and environments, and predictable and scalable performance.Our vision is simple: let’s let storage and storage services for persistent workloads be so fully integrated into the environment and hence managed automatically that is almost disappears into the background as just yet another infrastructure service that works.
Brief overview of the Rust system programming language. Provides a concise introduction of its basic features, with an emphasis on its memory safety features (ownership, moves, borrowing) and programming style with generic functions, structures, and traits.
Android Audio HAL – Audio Architecture – Audio HAL interface – Audio Policy – Audio HAL compilation & verification – Overview of Tinyalsa
Android Video HAL – Camera Architecture – Overview of camera HAL interface – Overview of V4L2 – Enabling V4l2 in kernel – Camera HAL compilation and verification
Droidcon Berlin 2021 - With coroutines being the de facto way of exposing async work and streams of changes for Kotlin on Android, developers are obviously attempting to use the same approaches when moving their code to Multiplatform.
But due to the way the memory model differs between JVM and Kotlin Native, it can be a painful experience.
In this talk, we will take a deep dive into the Coroutine API for Kotlin Multiplatform. You will learn how to expose your API with Coroutines while working with the Kotlin Native memory model instead of against it, and avoid the dragons along the way.
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentHostedbyConfluent
Currently, Apache Kafka® uses Apache ZooKeeper™ to store its metadata. Data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a separate ZooKeeper cluster. In 2019, we outlined a plan to break this dependency and bring metadata management into Kafka itself through a dynamic service that runs inside the Kafka Cluster. We call this the Quorum Controller.
In this talk, we’ll look at how the Quorum Controller works and how it integrates with other parts of the next-generation Kafka architecture, such as the Raft quorum and snapshotting mechanism. We’ll also explain how the Quorum Controller will simplify operations, improve security, and enhance scalability and performance.
Finally, we’ll look at some of the practicalities, such as how to monitor and run the Quorum Controller yourself. We’ll talk about some of the performance gains we’ve seen, and our plans for the future.
Kotlin Coroutines and Android sitting in a treeKai Koenig
Since the release of Kotlin 1.1 there is now the new language feature of Kotlin Coroutines available for use in Java and Android projects. Coroutines are a new way to write asynchronous and non-blocking code. They can be thought of as light-weight threads without having to deal with all the problems that threads bring to the table.
A lot of developers think that Kotlin Coroutines are mainly or only useful for Kotlin on the JVM, but that’s not true. There are a variety of use cases in which the application of Coroutines can make a lot of sense on Android.
This talk is introducing the ideas behind Kotlin Coroutines, showing how to use them in Kotlin code for both the JVM and Android via the kotlinx-coroutines APIs and then exploring specific applications in Android. Part of this is a deeper look into the use of Coroutines in higher-level frameworks such as AsyncAwait and Anko.
The Nextcloud Roadmap for Secure Team CollaborationUnivention GmbH
Nextcloud is an open source content collaboration platform that provides file sync and sharing, file server capabilities, and groupware functionality as an alternative to proprietary services like Dropbox, Google Suite, and Office 365. It allows for decentralization of data storage across trusted servers using open cloud mesh federation with end-to-end encryption and optional key recovery. Nextcloud supports collaboration across iOS, Android, Mac, Windows, Linux, and through CalDAV/CardDAV integration with email clients and Outlook/Thunderbird plugins.
Android uses cgroups to monitor system memory usage via the Low Memory Killer daemon and to group processes for effective CPU sharing. Cgroups are used to create mount points for memory and CPU control groups. The LMK daemon uses cgroups to receive memory pressure events and kill processes as needed. Init.rc uses cgroups to create groups for real-time and background tasks and assign CPU shares. Android further groups processes by scheduling policy for scheduling priorities.
GIT is a free and open source distributed version control system that allows developers to work collaboratively without needing centralized connectivity. It provides powerful branching capabilities that allow creating branches cheaply and merging them easily. Common GIT commands include init, clone, status, add, commit, log, remote, fetch, push, and pull. An example scenario demonstrates how multiple developers can clone a remote repository, make changes on their local repos, fetch and push changes between local and remote repos, and merge branches.
* Know the reasons why various operating systems exist and how they are functioned for dedicated purposes
* Understand the basic concepts while building system software from scratch
• How can we benefit from cheap ARM boards and the related open source tools?
- Raspberry Pi & STM32F4-Discovery
This is a brief description of feature flags (also know as feature toggles or feature switches) used in combination with version management and branching strategies to streamline and optimize CICD pipelines.
If you still haven't heard of it, there is a new star in JVM sky - Kotlin. This short presentation will serve as intro for those who wan't to hear what's all the fuss about and dive deeper into this new alternative to Java
Git is a distributed version control system that was created by Linus Torvalds as an improvement over centralized systems like Subversion; it works by tracking changes to files and committing snapshots of changes locally or to a remote server, and has a flexible branching workflow that allows users to work independently and merge changes together. The document provides an introduction to basic Git concepts, commands, and workflows for versioning code and collaborating through branching and merging changes.
Использование GMock для обеспечения спокойной и сытой жизни разработчика. Обзор как верхушки так и некоторых подводных частей GMock. Разбор возможностей фреймворка на примерах.
Rust and C++ are both systems programming languages but Rust provides better memory safety while maintaining performance. Rust uses a borrow checker to catch errors at compile time and disallows null references, while C++ relies more on programmer discipline. Rust also guarantees thread safety by preventing data races through its ownership and borrowing model.
Breve introduzione a GIT:
. Iniziare un nuovo progetto o clonarne uno esistente
. primi commit e comandi base
. esempi di utilizzo
Autore: Valerio Radice
tag line:
Tutorial GIT ITA italiano
OWASP Security Logging API easily extends your current log4j and logback logging with impressive features helpful for security, diagnostics/forensics, and compliance. Slide deck presentation from OWASP AppSecEU 2016 in Rome.
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Koan-Sin Tan
The document discusses using Neural Engine on A11 and A12 devices. It provides log outputs showing Neural Engine (ANE) being used on an iPhone Xs Max and not being used on an iPhone 8 Plus and iPhone 6s, which have A11 and earlier chips. It also shares code for checking the compute units and provides links to example projects for using Neural Engine on Core ML models.
The document discusses Git workflows, comparing centralized and feature branch workflows. It then describes Vincent Driessen's branching model which uses two main branches (master and develop) and three supporting branch types (feature, release, hotfix). The master branch is stable and used for production, while develop is integrated features. Feature branches branch off develop for new work, and release branches prepare releases by merging to develop and master. Hotfix branches fix production issues. The model aims to support collaboration while keeping branches stable. Special cases in applying the model are also addressed.
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017OpenEBS
The slides were presented by Jeffry Molanus who is the CTO of OpenEBS in Golang Meetup. OpenEBS is an open source cloud native storage. OpenEBS delivers storage and storage services to containerized environments. OpenEBS allows stateful workloads to be managed more like stateless containers. OpenEBS storage services include: per container (or pod) QoS SLAs, tiering and replica policies across AZs and environments, and predictable and scalable performance.Our vision is simple: let’s let storage and storage services for persistent workloads be so fully integrated into the environment and hence managed automatically that is almost disappears into the background as just yet another infrastructure service that works.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
The document provides a summary of 15 lectures on operating systems topics:
1. The first few lectures introduce concepts like computer organization, boot process, need for an operating system, and basic OS definitions.
2. Later lectures cover additional OS concepts like multiprogramming, multitasking, multiprocessing, memory protection, and interrupts.
3. The document discusses process management topics like process states, context switching, scheduling, and inter-process communication using pipes.
This document discusses multithreading and concurrency in .NET. It covers key concepts like processes and threads, and how they relate on an operating system level. It also discusses the Thread Pool, Task Parallel Library (TPL), Tasks, Parallel LINQ (PLINQ), and asynchronous programming patterns in .NET like async/await. Examples are provided for common threading techniques like producer/consumer and using the Timer class. Overall it serves as a comprehensive overview of multithreading and concurrency primitives available in the .NET framework.
The document discusses using threads instead of processes to handle concurrent connections in a server. It introduces POSIX threads (Pthreads) and describes five basic thread functions: pthread_create to create new threads; pthread_join to wait for a thread to terminate; pthread_self to get the calling thread ID; pthread_detach to change a thread from joinable to detached; and pthread_exit for a thread to terminate. It then shows how to recode a client and server example using threads instead of fork, including properly passing arguments to new threads.
This document discusses building a virtual platform for the OpenRISC architecture using SystemC and transaction-level modeling. It covers setting up the toolchain, writing test programs, and simulating the platform using event-driven or cycle-accurate simulation with Icarus Verilog or the Vorpsoc simulator. The virtual platform allows fast development and debugging of OpenRISC code without requiring physical hardware.
44CON London 2015 - Reverse engineering and exploiting font rasterizers: the ...44CON
A stack-based buffer overflow vulnerability was discovered in FreeType's CFF rasterizer during fuzz testing. The vulnerability occurs when building the hintmap data structure in the cf2_hintmap_build function. By analyzing the source code, it appears the vulnerability is caused by insufficient bounds checking when accessing the hint mask array via the maskPtr pointer, allowing writes beyond the end of the allocated buffer. This highlights the ongoing need to fuzz test font parsing libraries given the complexity of font formats and opportunity for security issues.
This document provides an overview of the FreeRTOS real-time operating system. It discusses that FreeRTOS is an open source, portable, and royalty-free RTOS kernel that is used in various commercial and industrial applications. It describes some of FreeRTOS's key components like tasks, queues for inter-process communication, semaphores for synchronization, and its preemptive priority-based scheduler. The document also provides details on FreeRTOS configuration, task states, priorities, the idle task, and how tasks are created and controlled using task control blocks and API calls.
TensorFlow is a dataflow-like model that runs on a wide variety of hardware platforms. It uses tensors and a directed graph to describe computations. Operations are abstract computations implemented by kernels that run on different devices like CPUs and GPUs. The core C++ implementation defines the framework and kernel functions, while the Python implementation focuses on operations, training, and providing APIs. Additional libraries like Keras, TensorFlow Slim, Skflow, PrettyTensor, and TFLearn build on TensorFlow to provide higher-level abstractions.
This document provides tips and tricks for Fluent related to input/output (IO) and batch processing, case setup and meshing, solving, post-processing, and reporting. It discusses parallel IO strategies, batch processing options, check pointing, the .fluent auto-execute file, mesh modifications like non-conformal interfaces and periodic boundaries, and the utility of various mesh-related tools.
Industry - Program analysis and verification - Type-preserving Heap Profiler ...ICSM 2011
Paper: Type-preserving Heap Profiler for C++
Authors: József Mihalicza, Zoltán Porkoláb and Ábel Gábor
Session: "Industry Track Session 4: Program analysis and Verification"
TensorFlow Lite is TensorFlow's lightweight solution for running machine learning models on mobile and embedded devices. It provides optimized operations for low latency and small binary size on these devices. TensorFlow Lite supports hardware acceleration using the Android Neural Networks API and contains a set of core operators, a new FlatBuffers-based model format, and a mobile-optimized interpreter. It allows converting models trained in TensorFlow to the TFLite format and running them efficiently on mobile.
Python bindings for SAF-AIS APIs offer many advantages to middleware developers, application developers, tool developers and testers. The bindings help to speed up the software development lifecycle and enable rapid deployment of architecture-independent components and services. This session will describe main principles guiding Python bindings implementation, and will have extensive in-depth application Python code examples using SAF-AIS services.
Week1 Electronic System-level ESL Design and SystemC Begin敬倫 林
This document provides an introduction and overview of electronic system level (ESL) design using SystemC. It begins with background on ESL design basics, system on chip design flows, and SystemC. It then provides 3 examples of SystemC code: a counter, traffic light, and simple bus. The counter example shows a basic module with clocked process. The traffic light demonstrates a finite state machine. The bus example illustrates an interface, master/slave devices, and memory mapped components communicating over a bus. Overall, the document serves as an introductory tutorial for designing and modeling electronic systems using the SystemC language.
This document provides an introduction and overview of standard library functions in C++. It discusses different header files like stdio.h, string.h, math.h, iostream.h, and ctype.h that contain commonly used functions. Examples of functions from each header file are listed, such as functions for input/output, string manipulation, mathematical operations, and character classification. Specific string and character related functions like isalpha, isdigit, toupper, and tolower are also explained with examples.
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward
This session will introduce a new open-source project - Flink TensorFlow - that enables Flink programs to operate on data using TensorFlow machine learning models. Applications include real-time image processing, NLP, and anomaly detection. The session will: - Introduce TensorFlow and describe its component model which allows for model reuse across environments - Demonstrate how to use TensorFlow models in Flink ML and Flink Streaming environments - Present a roadmap and provide opportunities to contribute
This document discusses Linux performance analysis tools. It introduces tpoint, a tool for tracing Linux tracepoints. Some example one-liners are provided that demonstrate how to use tpoint to trace disk I/O and see the tasks and processes performing I/O. The document also summarizes ftrace, a Linux kernel tracing tool that can be used to analyze performance issues.
Exploring Your Apple M1 devices with Open Source ToolsKoan-Sin Tan
This document summarizes Koan-Sin Tan's presentation on exploring Apple M1 devices using open source tools. Tan has experience using open source software on Unix systems dating back to the 1970s. The presentation covers how the macOS kernel is based on Mach and has some open source components. It also discusses using IOKit on macOS to access sensor data from devices, including temperature readings from an M1 MacBook Pro. Tan provides code examples for retrieving sensor data and details challenges in accessing private APIs and sensor data on iOS devices.
Running TFLite on Your Mobile Devices, 2020Koan-Sin Tan
This document summarizes a presentation about running TensorFlow Lite (TFLite) on mobile devices. Some key points:
- The presenter has experience with open source software and machine learning frameworks like TensorFlow and TFLite.
- TFLite allows deploying machine learning models on mobile and embedded devices for low latency and privacy. It supports models from frameworks like TensorFlow and PyTorch.
- The presentation will cover using TFLite on Android and iOS, TFLite metadata and code generation, and hardware acceleration delegates like XNNPACK and CoreML.
- TFLite metadata provides information about model inputs, outputs, and preprocessing to make models easier to use. Code generation helps integrate models into
Exploring Thermal Related Stuff in iDevices using Open-Source ToolKoan-Sin Tan
This is the era of so-called “dark silicon.” Thermal control is an important but seldom-talked topic. I could not find public information on how iOS does it. Recent checkm8 and follow-on checkra1n enable jailbreaking of iPhone 5s – iPhone X running iOS 12.3 and up. So that we can explore these devices with open-source tools
Caffe2 is a deep learning framework that provides first-class support for mobile deployment, including on Android devices. It has demonstrated speeds of up to 24 FPS for style transfer models on high-end Android devices using an OpenGL backend. The ARM Compute Library backend, added recently, shows potential to outperform the CPU backend for models like SqueezeNet. Caffe2 offers backends like NNPACK, Eigen, OpenGL, ARM Compute Library, and NEON for Android. Building and running benchmarks on Android is possible using the provided build scripts.
This document discusses using TensorFlow on Android. It begins by introducing TensorFlow and how it works as a dataflow graph. It then discusses efforts to optimize TensorFlow for mobile and embedded devices through techniques like quantization and models like MobileNet that use depthwise separable convolutions. It shares experiences building and running TensorFlow models on Android, including benchmarking an Inception model and building a label_image demo. It also compares TensorFlow mobile efforts to other mobile deep learning frameworks like CoreML and the upcoming Android Neural Networks API.
1) The patches add support for managing CPU idle states using the generic PM domain framework and runtime PM. This provides a unified approach for idle management across all devices.
2) Key aspects include extending PM domains to support multiple idle levels, initializing CPU PM domains from device tree, and adding a governor to determine idle states based on wakeup times and QoS.
3) The changes allow the Linux kernel to directly control CPU and cluster idle states when firmware supports the OS-initiated suspend mode in the PSCI standard.
A peek into Python's Metaclass and Bytecode from a Smalltalk UserKoan-Sin Tan
The document discusses metaclasses and bytecode in Python from the perspective of a Smalltalk user. Smalltalk influenced Python's use of bytecode, though Python's metaclasses differ from Smalltalk. Metaclasses in Smalltalk determine the class of a class, with every class being an instance of its metaclass. In Python, the default metaclass is type, but some standard classes use non-type metaclasses defined in the abc module. The document also provides an overview of Smalltalk bytecode categories and examples, and compares it to the Python bytecode generated for simple methods.
Android Wear and the Future of SmartwatchKoan-Sin Tan
Android Wear is Google's platform for wearable computing, currently focused on smartwatches. It allows for voice commands, wearable apps, and data exchange/synchronization with Android phones. However, current Android Wear offerings are limited as peripherals to phones, lacking inspiration from other bands/watches and no framework like HealthKit. The future of Android Wear may bring better integration of notifications, sensors, health data standards and more imaginative form factors beyond watches. Overall, current smartwatch technology has not yet realized the full potential envisioned in science fiction.
The document discusses various CPU benchmarks used to evaluate performance, including their pros and cons. It notes that synthetic benchmarks like Dhrystone and Whetstone have limitations and are outdated. Better benchmarks measure real applications or standardized workloads like CoreMark, which aims to replace Dhrystone by testing common algorithms like linked lists and matrices. The document also cautions that benchmarks can be manipulated and advocates for transparency in benchmarking methodology and results.
Dark Silicon, Mobile Devices, and Possible Open-Source SolutionsKoan-Sin Tan
This document summarizes a presentation about dark silicon in mobile devices and possible open source solutions. It discusses how power and thermal constraints are more severe for mobile devices due to limited battery progress and no fans. It also covers big.LITTLE scheduling, thread-level parallelism challenges, and user-level threading libraries like AsyncTask. Finally, it notes that while some open source parallel programming frameworks exist, fully utilizing parallelism on mobile and addressing dark silicon remain challenges with no widely adopted solutions.
This document summarizes an amateur Smalltalk user's observations about Ruby's object model and bytecode freedom based on a presentation given in Taiwan. It begins with an introduction and outline. It then discusses the speaker's background in Smalltalk and Ruby. The remainder of the document explores Smalltalk-80's object model in detail and compares it to Ruby's object model and metaclass hierarchy.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.
Impelsys provided a robust testing solution, leveraging a risk-based and requirement-mapped approach to validate ICU Connect and CritiXpert. A well-defined test suite was developed to assess data communication, clinical data collection, transformation, and visualization across integrated devices.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including:
-Artificial Intelligence Market Overview
-Strategies for AI Adoption in 2025
-Anticipated drivers of AI adoption and transformative technologies
-Benefits of AI and Big data for your business
-Tips on how to prepare your business for innovation
-AI and data privacy: Strategies for securing data privacy in AI models, etc.
Download your free copy nowand implement the key findings to improve your business.
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
Mobile App Development Company in Saudi ArabiaSteve Jonas
EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungenpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-und-verwaltung-von-multiuser-umgebungen/
HCL Nomad Web wird als die nächste Generation des HCL Notes-Clients gefeiert und bietet zahlreiche Vorteile, wie die Beseitigung des Bedarfs an Paketierung, Verteilung und Installation. Nomad Web-Client-Updates werden “automatisch” im Hintergrund installiert, was den administrativen Aufwand im Vergleich zu traditionellen HCL Notes-Clients erheblich reduziert. Allerdings stellt die Fehlerbehebung in Nomad Web im Vergleich zum Notes-Client einzigartige Herausforderungen dar.
Begleiten Sie Christoph und Marc, während sie demonstrieren, wie der Fehlerbehebungsprozess in HCL Nomad Web vereinfacht werden kann, um eine reibungslose und effiziente Benutzererfahrung zu gewährleisten.
In diesem Webinar werden wir effektive Strategien zur Diagnose und Lösung häufiger Probleme in HCL Nomad Web untersuchen, einschließlich
- Zugriff auf die Konsole
- Auffinden und Interpretieren von Protokolldateien
- Zugriff auf den Datenordner im Cache des Browsers (unter Verwendung von OPFS)
- Verständnis der Unterschiede zwischen Einzel- und Mehrbenutzerszenarien
- Nutzung der Client Clocking-Funktion
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, presentation slides, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
2. • disclaimer: opinions are my own
• feel free to interrupt me if you have any questions during the presentation
• questions could be Taiwanese, English, or Mandarin
• most of TFRT materials are adapted from TFRT deep dive in MLIR design meeting [1] and TFRT docs [2]
• code around Aug 1, 2020 (git commit ecf1c20 [3])
[1] TFRT Deep Dive, slides - recording, https://mlir.llvm.org/talks/
[2] https://github.com/tensorflow/runtime/tree/master/documents
[3] https://github.com/tensorflow/runtime/commit/ecf1c20
2
3. • Used open source before the term “open
source” is used
• A software guy, learned to use Unix and open
source software on VAX-11/780 running 4.3BSD
• Used to be a programming language junkie
• Worked on various system software, e.g., CPU
scheduling and power management of non-
CPU components
• Recently, on NN performance on edge devices
related stuff
• Contributed from time to time to TensorFlow Lite
• started a command line label_image for TFLite
who i am
https://gunkies.org/w/images/c/c1/DEC-VAX-11-780.jpg
3
4. What is TFRT
• TensorFlow Runtime (TFRT) is one of the two new MLIR runtimes emerged in 2020 so far.
• The other one is Intermediate Representation Execution Environment, IREE. It seems
so far tfrt has better design documentation
• Both of them have mobile / edge environment in mind.
• I didn’t see mobile accelerated code in TFRT yet.
• IREE has some Vulkan related code and some simple code works on Android already
• ResNet GPU inference is 28% faster with TFRT
• https://github.com/tensorflow/runtime, https://youtu.be/15tiQoPpuZ8
4
5. Build it
• if you follow the instructions described in README.md, it should just work. At least on x86_64 linux.
• however, it’s not tested for non Linux environment yet
• ssize_t and int64_t
• on Mac OS X: ssize_t: long, int64_t: long long
• current code mixed the use of ssize_t and int64_t
• test: one the acclaimed features of TFRT, like MLIR, is its use of
LLVM FileCheck
• my hacks, shape related (ssize_t) tests not fixed yet
• it’s not tested on non-x86 platforms, such as aarch64, either
•
5
6. • The three key directories under the TFRT root directory are
• lib: Contains core TFRT infrastructure code
• backends: Contains device specific infrastructure and op/kernel implementations
• include: Contains public header files for core TFRT infrastructure
6
7. Walking thru the tutorial
• unfortunately, it seems it’s not easy to jump directly into source code without having
some background knowledge
• so we’ll walk thru the tutorial [1]
• What are in the tutorial
• print hello world
• print integer
• adding kernels
[1] https://github.com/tensorflow/runtime/blob/master/documents/tutorial.md
7
8. using tfrt and tfrt_test
hello.mlir
func @hello() {
%chain = tfrt.new.chain
// Create a string containing "hello world" and store it in %hello.
%hello = "tfrt_test.get_string"() { string_attr = "hello world" } : () -> !tfrt.string
// Print the string in %hello.
"tfrt_test.print_string"(%hello, %chain) : (!tfrt.string, !tfrt.chain) -> !tfrt.chain
tfrt.return
}
The ‘@hello function above shows how to create and print a string. The text after each ‘:’ specifies the types involved:
• ()->!tfrt.string means that tfrt_test.get_string takes no arguments and returns a !tfrt.string. tfrt is a
MLIR dialect prefix (or namespace) for TFRT
• (!tfrt.string, !tfrt.chain) -> !tfrt.chain means that tfrt_test.print_string takes two arguments (!
tfrt.string and !tfrt.chain) and returns a !tfrt.chain. chain [1] is a TFRT abstraction to manage dependencies
[1] https://github.com/tensorflow/runtime/blob/master/documents/explicit_dependency.md
8
9. hello world in MLIR
func @stringconstant() -> !llvm<"[12 x i8]"> {
%1 = llvm.constant("Hello world!") : !llvm<"i8*">
// CHECK: ret [12 x i8] c"Hello world!"
llvm.return %1 : !llvm<"i8*">
}
func @main() {
%0 = llvm.constant(0) : !llvm.i64
%1 = call @stringconstant() : () -> !llvm<"[12 x i8]">
%2 = llvm.getelementptr %1[%0] : (!llvm<"[12 x i8]">, !llvm.i64) -> !llvm<"i8*">
%3 = llvm.bitcast %2 : !llvm<"i8*"> to !llvm<"i8*">
%32 = llvm.call @puts(%2) : (!llvm<"i8*">) -> !llvm.i32
return
}
func @puts(!llvm<"i8*">) -> !llvm.i32
• MLIR “standard dialect” doesn’t have I/O functions
• there is LLVM dialect, of course we can use LLVM to call standard libc
function
9
10. Hello integer
func @hello_integers() {
%chain = tfrt.new.chain
// Create an integer containing 42.
%forty_two = tfrt.constant.i32 42
// Print 42.
tfrt.print.i32 %forty_two, %chain
tfrt.return
}
• as stated in the tutorial, we can run other functions in the same modular
• we can turn to more basic ones, such as integers or floating point numbers
• @hello_integers shows how to create and print integers
• This example does not have the verbose type information we saw in @hello because there are
custom parsers for the tfrt.constant.i32 and tfrt.print.32 kernels in
basic_kernels.td
10
11. basic_kernels.td
• .td (table description?) files are for LLVM TableGen
[1] TableGen, https://llvm.org/docs/TableGen/
class ConstantOp<string suffix, Type baseType, Attr attr>
: TFRT_Op<"constant." # suffix, [NoSideEffect]> {
let summary = "host executor constant value constructor";
let arguments = (ins attr:$value);
let results = (outs baseType);
}
class PrintOp<string suffix, Type type> : TFRT_Op<"print." # suffix> {
let summary = "tfrt.print operation";
let description = [{
An operation takes a number input and a chain input.
It prints the number to stdout and returns a chain output.
The chain input must be the second operand.
Example:
%2 = tfrt.print.i32 %0, %1
}];
let arguments = (ins type, TFRT_ChainType);
let results = (outs TFRT_ChainType);
let assemblyFormat = "operands attr-dict";
let verifier = ?;
}
https://github.com/tensorflow/runtime/blob/master/include/tfrt/basic_kernels/opdefs/basic_kernels.td#L376-L390
https://github.com/tensorflow/runtime/blob/master/include/tfrt/basic_kernels/opdefs/basic_kernels.td#L58-L64
11
13. user defined kernels
func @print_coordinate() {
%chain = tfrt.new.chain
%two = tfrt.constant.i32 2
%four = tfrt.constant.i32 4
%coordinate = "my.create_coordinate"(%two, %four) : (i32, i32) -> !my.coordinate
"my.print_coordinate"(%coordinate, %chain) : (!my.coordinate, !tfrt.chain) -> !tfrt.chain
tfrt.return
}
coordinate.mlir shows several TFRT features:
• MLIR types that begin with exclamation mark (!) are user-defined types like !my.coordinate,
compared to built-in types like i32
• Kernels are just C++ functions with a name in MLIR: my.print_coordinate is the MLIR name for
the C++ PrintCoordinate function
• Kernels may pass arbitrary user-defined types: my.create_coordinate passes a custom
Coordinate struct to my.print_coordinate 13
14. to dig into some code we need
more system information
14
16. • TensorFlow user passes into TFRT a
TensorFlow graph created via high-level
TensorFlow APIs, and
• TFRT then calls the MLIR-based graph
compiler to optimize and lower the
graph into BEF, a Binary Executable
Format for TFRT graph execution (MLIR
is the compiler infrastructure that we
use to represent TFRT host programs).
• The blue arrows in the simplified
TensorFlow training stack diagram
show this flow.
16
17. • In the README.md we are told to build two
binaries: tfrt_translate and bef_excutor
• tfrt_translate
• The tfrt_translate program does round trip
translation between MLIR and BEF, similar
to an assembler and disassembler.
• bef_executor
• The bef_executor program is the
execution driver of BEF files. It reads in a
BEF file, sets up runtime, and
asynchronously executes function(s) in
that file.
17
18. TFRT Host Runtime
• Foundation of TFRT: schedules work on the host and devices
• Clean separation between host and device runtimes:
• Host runtime does not know anything about devices, just their runtimes (sets of kernels)
• Key design points:
• Fully asynchronous - kernel executions can not block
• Excellent error propagation in the presence of asynchrony
• Performance as a first-class concern, for graph and eager
• Outline:
• Common runtime infrastructure
• Graph execution
• Op-by-op execution (“eager”)
18
19. • Container for data or resources
• Not Tensor specific
• A “future” type, fulfilled with exactly one value, or an error
• Lock-free, low memory overhead, type erased, reference
counted
• Helper class AsyncValueRef<T> provides type safety when
contained type is known
• AsyncValues enable efficient asynchronous compute
• Asynchronous functions return unavailable AsyncValues
• Caller can schedule dependent
computations with AsyncValue::AndThen()
• Caller need not block until AsyncValue
becomes available
Key Abstraction: AsyncValue
https://github.com/tensorflow/runtime/blob/master/include/tfrt/host_context/async_value.h
19
20. Kernels
• Kernel: unit of computation scheduled by the runtime
• Similar to kernel concept in current TensorFlow
• Kernels accept AsyncValue inputs and produce AsyncValue output
• Runtime coordinates dataflow of AsyncValues between kernels
• Outputs may not be immediately available, unlike current TensorFlow
• Runtime generally does not understand kernel semantics
// Kernel that adds two integers.
// AsyncKernelFrame holds the kernel’s arguments and results.
static void TFRTAdd(AsyncKernelFrame* frame) {
// Fetch the kernel’s 0th argument.
AsyncValue* arg1 = frame->GetArgAt(0);
// Fetch the kernel’s 1st argument.
AsyncValue* arg2 = frame->GetArgAt(1);
int v1 = arg1->get<int>();
int v2 = arg2->get<int>();
// Set the kernel’s 0th result.
frame->EmplaceResultAt<int>(0, v1 + v2);
}
https://github.com/tensorflow/runtime/blob/master/documents/tfrt_host_runtime_design.md
https://github.com/tensorflow/runtime/blob/master/lib/basic_kernels/integer_kernels.cc#L39-L45
https://github.com/tensorflow/runtime/blob/master/include/tfrt/host_context/kernel_utils.h#L61-L149
20
21. Host Program
• Host programs encode a dataflow graph
• Similar to GraphDef in current TensorFlow
• Expressed in MLIR. Typically compiler generated
• Designed for low-level dispatch efficiency
• Designed for compiler transformations and analysis, e.g.,
• Use dataflow analysis for buffer reuse
func @sample_function() -> i32 {
%one = tfrt.constant.i32 1 // Make AsyncValue with value 1
%two = tfrt.constant.i32 2 // Make AsyncValue with value 2
%three = tfrt.add.i32 %one, %two // Make AsyncValue with value 3 (1+2)
%ch0 = tfrt.new.chain
tfrt.print.i32 %three, %ch0 // Print AsyncValue %three
tfrt.return %three : i32 // Return AsyncValue %three
}
21
22. TFRT Binary Executable Format (BEF)
• BEF encodes a hardware-specific lowered graph
function
• Primary interface between compiler and runtime
• Designed for efficient execution
• Low overhead: execute program by reading mmap’d
byte array
• Persistent and stable: Compile once offline, run
many times
online. Great for inference use-cases
• Composed of sections, similar to ELF. Each section
has its own format
• Extensible: BEF is versioned, reader ignores unknown
sections, new versions may define new sections https://github.com/tensorflow/runtime/blob/master/documents/binary_executable_format.md
22
23. BEF Executor
• BEF Executor evaluates a BEF dataflow graph “executor” style:
• Not a bytecode-like interpreter: no concept of program counter
• “Strict” execution by default: run a kernel only when all its inputs are available
• Executor features:
• Lock-free: atomics instead of mutexes
• Non-blocking: defer dependent work with AsyncValue::AndThen
• Supports “non-strict” execution: may run a kernel when some of its
inputs are available
• Good for efficiently forwarding unavailable inputs to outputs
• Key concepts:
• BEF: dataflow graph
• Kernel: dataflow node
• AsyncValues: dataflow edge
https://github.com/tensorflow/runtime/blob/master/lib/bef_executor/bef_interpreter.cc#L223-L25423
25. How about Core Runtime?
• Surely, we can do similar walkthrough, but that will takes more time
• Two things
• Op Execution API, Execute()
• BEF Executor can handle it too
void CoreRuntime::Impl::Execute(const ExecutionContext& exec_ctx,
string_view op_name, OpHandler* op_handler,
MutableArrayRef<TensorHandle> arguments,
const OpAttrsRef& attrs,
MutableArrayRef<TensorHandle> results,
AsyncValueRef<Chain>* chain) {
// Ask the op_handler to execute the op. If successful, we're done.
auto op_handle = op_handler->MakeOp(op_name);
if (op_handle) {
op_handle.get()(exec_ctx, arguments, attrs, results, chain);
return;
}
// Otherwise, we fail with an 'unknown op' error.
auto err =
EmitErrorAsync(exec_ctx, "op '" + op_name.str() + "' is not supported");
for (auto& result : results) result = TensorHandle(err.CopyRef());
if (chain) *chain = std::move(err);
}
25
https://github.com/tensorflow/runtime/blob/master/lib/core_runtime/core_runtime.cc#L124-L143
https://github.com/tensorflow/runtime/blob/master/documents/
tfrt_op_by_op_execution_design.md
27. Device Runtime
CPU
27
//===----------------------------------------------------------------------===//
// CPU Relu kernels
//===----------------------------------------------------------------------===//
// Computes B = Relu(A).
template <typename T>
static AsyncValueRef<Chain> Relu(const DenseHostTensor& A, DenseHostTensor* B,
const ExecutionContext& exec_ctx) {
auto fn = [](auto& a, auto& b) { return a.cwiseMax(static_cast<T>(0)); };
return ::tfrt::compat::UnaryEigenKernelAsync<T, T>(A, B, std::move(fn),
exec_ctx);
}
//===----------------------------------------------------------------------===//
// CPU BiasAdd kernels
//===----------------------------------------------------------------------===//
// A special case of tf.add where bias is restricted to be 1-D.
// Currently only support NHWC data format.
template <typename T, size_t RANK>
static AsyncValueRef<Chain> BiasAdd(const DenseHostTensor& input,
const DenseHostTensor& bias,
DenseHostTensor* output,
const ExecutionContext& exec_ctx) {
DHTIndexableView<T, RANK> input_view(&input);
MutableDHTIndexableView<T, RANK> output_view(output);
DHTIndexableView<T, 1> bias_view(&bias);
const auto& shape_input = input_view.FixedShape();
const auto& shape_bias = bias_view.FixedShape();
const auto& shape_output = output_view.FixedShape();
if (shape_input != shape_output) {
return EmitErrorAsync(exec_ctx, "unexpected output shape");
}
if (shape_bias[0] != shape_input[RANK - 1]) {
return EmitErrorAsync(exec_ctx, "bias shape does not match input shape");
}
// Reshape bias to the shape of input. Broadcast along the last axis of input.
Eigen::array<Eigen::Index, RANK> reshape_dims;
Eigen::array<Eigen::Index, RANK> broadcast_dims;
for (size_t i = 0; i < RANK - 1; ++i) {
reshape_dims[i] = static_cast<Eigen::Index>(1);
broadcast_dims[i] = static_cast<Eigen::Index>(shape_input[i]);
}
reshape_dims[RANK - 1] = static_cast<Eigen::Index>(shape_bias[0]);
broadcast_dims[RANK - 1] = static_cast<Eigen::Index>(1);
auto input_t = AsEigenConstTensor(input_view);
auto bias_t = AsEigenConstTensor(bias_view);
auto output_t = AsEigenTensor(output_view);
auto expr = input_t + bias_t.reshape(reshape_dims).broadcast(broadcast_dims);
return AsyncAssign(
exec_ctx.host()->GetOrCreateSharedContext<EigenHostContext>(),
std::move(output_t), std::move(expr),
KeepBuffers::alive(&input, &bias, output));
}
https://github.com/tensorflow/runtime/blob/master/backends/cpu/lib/kernels/cpu_kernels.h
28. Dialects we can see now
• tfrt: we know what this is for
• tfrt_test: to test tfrt
• tfrt_data: tf.data, to deal with input pipeline
• tfrt_dht: dense host tensor
• corert: Core Runtime, eager execution
• ts: tensor shape
• coo: COOrdinate list sparse tensor
• eigen: wrapper around the eigen library
• btf: binary tensor format
• cuda: you know what cuda means :-)
28
29. Concluding Remarks
• MLIR related talks and publications, https://mlir.llvm.org/talks/
• We scratched the surface of TFRT host runtime and core runtime. There are more details
• threading model: thread pool / work queue,
• memory allocation: tcmalloc for server, other small allocators for embedded systems,
• non-strict execution, and
• registers: BEF executor is a register machine
• we didn’t touch other important components such as device runtimes, eps. the GPU
part, and distributed environment
29
31. Device Runtime Design Principles
• A thin wrapper of low-level (driver) APIs, exposing device capabilities to graph compiler
• Memory Allocation
• Async host <-> device transfer, and kernel execution
• Dependency management
• Focus on mechanism instead of policy
• E.g. No built-in special-purpose streams for GPU support:
• For pure eager execution, can default to one stream for everything
• For tf.function execution, compiler can pick streams
31