It is the presentation file used by Jim Huang (jserv) at OSDC.tw 2009. New compiler technologies are invisible but highly integrated around our world, and we can enrich the experience via facilitating LLVM.
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)Mr. Vengineer
2016年2月20日(金)のZynq Ultrasclae+ MPSoC 勉強会で使った資料です。
追記) 2016.05.08
公式ARM Trusted Firmwareのサイトに、Zynq UltraScale+ MPSoCの実装が追加されていていることを明記した
This is the material I used at Zynq Ultrasclae + MPSoC SIG on 20th February (Friday).
Addendum) 2016.05.08
We stated that the implementation of Zynq UltraScale + MPSoC was added to the official ARM Trusted Firmware site.
Launch the First Process in Linux SystemJian-Hong Pan
The session: https://coscup.org/2022/en/session/AGCMDJ
After Linux kernel boots, it will try to launch first process “init” in User Space. Then, the system begins the featured journey of the Linux distribution.
This sharing takes Busybox as the example and shows that how does Linux kernel find the “init” which directs to the Busybox. And, what will Busybox do and how to get the console. Try to make it like a simple Linux system.
Before Linux kernel launches “init” process, the file system and storage corresponding drivers/modules must be loaded to find the “init”. Besides, to mount the root file system correctly, the kernel boot command must include the root device and file system format parameters.
On the other hand, the Busybox directed from “init” is a lightweight program, but has rich functions, just like a Swiss Army Knife. So, it is usually used on the simple environment, like embedded Linux system.
This sharing will have a demo on a virtual machine first, then on the Raspberry Pi.
Drafts:
* https://hackmd.io/@starnight/Busbox_as_the_init
* https://hackmd.io/@starnight/Build_Alpines_Root_Filesystem_Bootstrap
Relate idea: https://hackmd.io/@starnight/Systems_init_and_Containers_COMMAND_Dockerfiles_CMD
This document discusses using the GNU Debugger (GDB) to debug programs. It begins with an introduction to GDB and why it is useful. Examples are then provided of using GDB for interactive debugging, examining core dumps, patching binaries, and advanced tricks. A real-world case study demonstrates using GDB to debug a crash in the GNU C library by examining assembly code and source-level debugging with debug symbols. The document concludes by mentioning another case study involving hijacking file descriptors in GDB.
It is the presentation file used by Jim Huang (jserv) at OSDC.tw 2009. New compiler technologies are invisible but highly integrated around our world, and we can enrich the experience via facilitating LLVM.
ZynqMPのブートとパワーマネージメント : (ZynqMP Boot and Power Management)Mr. Vengineer
2016年2月20日(金)のZynq Ultrasclae+ MPSoC 勉強会で使った資料です。
追記) 2016.05.08
公式ARM Trusted Firmwareのサイトに、Zynq UltraScale+ MPSoCの実装が追加されていていることを明記した
This is the material I used at Zynq Ultrasclae + MPSoC SIG on 20th February (Friday).
Addendum) 2016.05.08
We stated that the implementation of Zynq UltraScale + MPSoC was added to the official ARM Trusted Firmware site.
Launch the First Process in Linux SystemJian-Hong Pan
The session: https://coscup.org/2022/en/session/AGCMDJ
After Linux kernel boots, it will try to launch first process “init” in User Space. Then, the system begins the featured journey of the Linux distribution.
This sharing takes Busybox as the example and shows that how does Linux kernel find the “init” which directs to the Busybox. And, what will Busybox do and how to get the console. Try to make it like a simple Linux system.
Before Linux kernel launches “init” process, the file system and storage corresponding drivers/modules must be loaded to find the “init”. Besides, to mount the root file system correctly, the kernel boot command must include the root device and file system format parameters.
On the other hand, the Busybox directed from “init” is a lightweight program, but has rich functions, just like a Swiss Army Knife. So, it is usually used on the simple environment, like embedded Linux system.
This sharing will have a demo on a virtual machine first, then on the Raspberry Pi.
Drafts:
* https://hackmd.io/@starnight/Busbox_as_the_init
* https://hackmd.io/@starnight/Build_Alpines_Root_Filesystem_Bootstrap
Relate idea: https://hackmd.io/@starnight/Systems_init_and_Containers_COMMAND_Dockerfiles_CMD
This document discusses using the GNU Debugger (GDB) to debug programs. It begins with an introduction to GDB and why it is useful. Examples are then provided of using GDB for interactive debugging, examining core dumps, patching binaries, and advanced tricks. A real-world case study demonstrates using GDB to debug a crash in the GNU C library by examining assembly code and source-level debugging with debug symbols. The document concludes by mentioning another case study involving hijacking file descriptors in GDB.
The document discusses software driven verification using Xilinx's xsim simulator. It describes using the Xilinx Simulator Interface (XSI) which allows a C/C++ program to act as a testbench for an HDL design in xsim. It provides details on how to use XSI functions like getting port numbers and values, running simulation, and controlling simulation from C++. It also discusses calling XSI functions through dynamic linking and using SystemVerilog DPI to directly access the DUT from C++.
TVM uses Verilator and DPI to connect Verilog/Chisel accelerator models written in SystemVerilog/Chisel to Python code. It initializes the hardware model and controls simulation using methods like SimLaunch, SimWait, SimResume. The Python code loads the accelerator module, allocates memory, runs the accelerator by calling driver functions that interface with the DPI to initialize, launch and wait for completion of the accelerator. This allows accelerators developed in Verilog/Chisel to be tested from Python.
Cloud Deep Learning Chips Training & InferenceMr. Vengineer
This document summarizes various chips for deep learning training and inference in the cloud from companies such as Google, Intel, Habana Labs, Alibaba, and Graphcore. It provides information on the specs and capabilities of each chip, such as the memory type and TFLOPS, and links to product pages and documentation. It also discusses collaborations between companies on projects like Glow, ONNX, and OCP accelerator modules.
Glow is a compiler and execution engine for neural networks created by Facebook. It takes a high-level graph representation of a neural network and compiles it into efficient machine code for different hardware backends like CPU and OpenCL. The key steps in Glow include loading a model, optimizing the graph, lowering it to a low-level IR, scheduling operations to minimize memory usage, generating instructions for the backend, and performing optimizations specific to the target. Glow aims to provide a portable way to deploy neural networks across different hardware platforms.
Bridge TensorFlow to run on Intel nGraph backends (v0.4)Mr. Vengineer
This document discusses bridging TensorFlow to run on Intel nGraph backends. It summarizes various optimization passes used in the nGraph-TensorFlow integration, including passes to liberate nodes from placement constraints, confirm placement, cluster the graph, and encapsulate clusters. Key points:
- NGraphLiberatePass and NGraphConfirmPass run during the PRE_PLACEMENT phase to handle nGraph placement
- NGraphClusterPass runs during POST_REWRITE_FOR_EXEC to cluster the graph into subgraphs, similar to XLA partitioning
- NGraphEncapsulatePass encapsulates clusters into NGraphEncapsulateOp nodes, analogous to XLA's use of _XlaLaunchOp
-
Bridge TensorFlow to run on Intel nGraph backends (v0.5)Mr. Vengineer
The document describes how the nGraph TensorFlow bridge works by rewriting TensorFlow graphs to run on Intel nGraph backends. It discusses how optimization passes are used to modify the graph in several phases: 1) Capturing TensorFlow variables as nGraph variables, 2) Marking/assigning/deassigning nodes to clusters, 3) Encapsulating clusters into nGraphEncapsulateOp nodes to run subgraphs on nGraph. Key classes and files involved are described like NGraphVariableCapturePass, NGraphEncapsulatePass, and how they implement the different rewriting phases to prepare the graph for nGraph execution.
TensorFlow XLAの中では、
XLA Client を Pythonで利用できるようになっています。
また、2018年2月に開催されたSysMLの論文(JAX@Google)についても追記しました。
In TensorFlow XLA,
XLA Client is now available in Python.
Also added about SysML's paper (JAX @ Google) held in February 2018.
Tiramisu is a code optimization and generation framework that can be integrated into custom compilers. It supports various backends including multi-CPU (using LLVM), GPU (using CUDA), distributed systems (using MPI), and FPGAs (using Xilinx Vivado HLS). Tiramisu uses polyhedral representations to support irregular domains beyond just rectangles. The document provides an overview of Tiramisu and discusses challenges related to supporting different platforms, memory dependencies, efficient code generation, and representations. It also mentions that Tiramisu uses Halide and ISL.
Tiramisu : A Code Optimization Framework for High Performance Systems
https://www.csail.mit.edu/research/tiramisu-framework-code-optimization-and-code-generation
の概要です。
ドキュメントがほとんどないので、ソースコード解析をやって、サンプルプログラムの内容について、調べてみました。
Tensor Comprehensions is a tool from Facebook AI Research for writing custom neural network layers. It allows non-experts to write layers that achieve good performance. It can be used with frameworks like PyTorch and Caffe2. Users define layers using a simple domain-specific language. Tensor Comprehensions then optimizes and compiles the layers for fast GPU execution.
TensorFlow Lite (r1.5) & Android 8.1 Neural Network APIMr. Vengineer
This document discusses TensorFlow Lite 1.5 and the Android 8.1 Neural Networks API. It provides an overview of converting TensorFlow models to the TensorFlow Lite format using conversion tools, and running those models on Android using the TensorFlow Lite and Neural Networks APIs. The key steps are converting TensorFlow models to TensorFlow Lite format, creating an interpreter to run the model, and using the interpreter and Neural Networks API to execute the model on Android hardware like the CPU.
4. Software Driven Verification
● ソフトウェア(プログラム)を使って、ハードウェア(RTL等)を検証する
DUT (Design under Test)
RTL等で記述
Model
Driver/Checker/Monitor
Test Program
Top Test Bench
5. SystemVerilogでは?
● UVM (Universal Verification Methodology) : UVM 2020-1.1
DUT (Design under Test)
SystemVerilog
Model
SystemVerilog
Test Program
SystemVerilog
Top Test Bench
(SystemVerilog)
現状、商用HDLシミュレータのみ利用可能
6. Verilator とは? (https://github.com/verilator)
Welcome to Verilator, the fastest Verilog/SystemVerilog simulator.
● Accepts synthesizable Verilog or SystemVerilog
● Performs lint code-quality checks
● Compiles into multithreaded C++, or SystemC
● Creates XML to front-end your own tools
テストベンチ側に、
● マルチスレッドな C++
● SystemC
が使える
9. Verilatorでは?
● DUTは、SystemVerilog の RTL記述のみ使える
DUT (Design under Test)
SystemVerilog RTL
Model
C++/SystemC
Test Program
C++/SystemC
Top Test Bench
(C++/SystemC)
10. Verilator + SystemC
● DUTは、SystemVerilog の RTL記述のみ使える
DUT (Design under Test)
SystemVerilog RTL
Model
SystemC
Test Program
C++
Top Test Bench
(SystemC)
12. Verilator + SystemC
● DUT(Memory)にアクセスするケース
DUT (Memory)
SystemVerilog RTL
Bus Functional Model
SystemC
Test Program
SystemC
Top Test Bench
(SystemC)
● Read
● Write
13. module top // Memory だけど、Verilatorの慣習で top にしています
(
input logic clk,
input logic reset,
input logic [15:0] addr,
input logic cs,
input logic rw,
input logic [31:0] data_in,
output logic ready,
output logic [31:0] data_out
);
localparam ram_size = (17'h10000>>2);
logic [31:0] ram[ram_size];
enum {STATE_IDLE, STATE_RUN, STATE_DONE} state;
always_ff @(posedge clk) begin
if(reset == 1'b1)
state <= STATE_IDLE;
else if(cs == 1'b1 && state == STATE_IDLE)
state <= STATE_RUN;
else if(cs == 1'b1 && state == STATE_RUN)
state <= STATE_DONE;
else if(cs == 1'b0)
state <= STATE_IDLE;
end
DUT (Memory)
always_ff @(posedge clk) begin
if(reset == 1'b1) begin
data_out <= 32'h0000_0000;
ready <= 1'b0;
end
else if(state == STATE_RUN) begin
if(rw == 1'b1)
data_out <= ram[addr[15:2]];
else
ram[addr[15:2]] <= data_in;
ready <= 1'b1;
end
else begin
data_out <= 32'h0000_0000;
ready <= 1'b0;
end
end
endmodule
19. Verilator + SystemC
● Test Program を別ファイルにして、いろいろなテストができる
DUT (Memory)
SystemVerilog RTL
Bus Functional Model
SystemC
Test Program
SystemC
Top Test Bench
(SystemC)
● Read
● Write
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
21. Verilator + SystemC + SystemVerilog DPI
● SystemVerilogのDPIを使うと、DUTの中に直接アクセスできる
DUT (Memory)
SystemVerilog RTL
Bus Functional Model
SystemC
Test Program
SystemC
Top Test Bench
(SystemC)
● Read
● Write
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
Test Program
SystemC
SystemVerilog DPI