SlideShare a Scribd company logo
University of Iowa | Mobile Sensing Laboratory
Static Memory Management for
Efficient Mobile Sensing
Applications
EMSOFT 2015
Farley Lai, Daniel Schmidt, Octav Chipara
Department of Computer Science
University of Iowa | Mobile Sensing Laboratory
• A class of applications that process continuous input data
streams and may produce continuous output streams
– real-time processing
– efficient resource management
Emerging Mobile Sensing Applications
2
Speaker
Models
Speech
Recording
VAD
Feature
Extraction
HTTP
Upload
Speaker Identifier
Introduction
Sensing Stream Processing
University of Iowa | Mobile Sensing Laboratory
• Workload: stream operations on frames of samples
– e.g., windowing, splitting, or appending
– stream operation tend to be memory intensive
• Goal: implement stream operations efficiently
– reduce memory footprint
– reduce number of memory accesses
• Challenges:
– handle complex interaction between components
– avoid unnecessary memory copies
– enable data sharing between components
The Memory Management Challenge
3
Introduction
University of Iowa | Mobile Sensing Laboratory
• Dynamic memory management
– specialized data structures to implement memory management
• e.g., SigSeg [Girod, et al. 2008] – linked list of buffered samples
– a level of indirection in accessing streaming data
• Static memory management
– no runtime overhead
– requires precise knowledge of the variable live ranges
• difficult to achieve in complex applications
• must be time-efficient to be included in compilers
Approaches to Memory Management
4
Introduction
[Girod2008] L. Girod, Y. Mei, R. Newton, S. Rost, A. Thiagarajan, H. Balakrishnan, and S. Madden,
“XStream: a Signal-Oriented Data Stream Management System,” in ICDE, 2008.
University of Iowa | Mobile Sensing Laboratory
• Application model
• Static analysis
• Memory layout
• Evaluation
• Conclusions
Outline
5
University of Iowa | Mobile Sensing Laboratory
• StreamIt – synchronous data flow (SDF) language
– application = graph of filters connected with FIFO channels
• limited memory operations: pop(), peek(), and push()
• known consumption and production rates
A Model for Stream Applications
6
pop
peek
push
Filter::work()
INPUT: OUTPUT:
University of Iowa | Mobile Sensing Laboratory
• StreamIt – synchronous data flow language
– applications are constructed hierarchically
• pipeline of streams
• split and joins (splitter and joiner)
– pass-by-value semantics
• naïve implementation would incur significant number of copies
A Model for Stream Applications
7
LPF2
Source
Duplicate
LPF1
Subtract Sink
Round-Robin
University of Iowa | Mobile Sensing Laboratory
• SDFs may be executed in a cyclo-static schedule
– the complete memory behavior of the program may be
observed within one execution of the schedule
• Our solution: static analysis + memory layout
Insight
8
LPF2
Source
Duplicate
LPF1
Subtract Sink
RoundRobin
Source,3 DUP, 3 LPF1,1 LPF2,1
Source,1 DUP, 1 LPF1,1 LPF2,1 RR,1 Sub,1 Sink
INIT PHASE:
STEADY
PHASE:
RR,1 Sub,1 Sink
University of Iowa | Mobile Sensing Laboratory
• Location Sharing
– an output element is pushed from an unmodified input element
– each I/O element is associated with a pop/push index
• Temporal Sharing
– an output element reuses the input element storage
– each I/O element is associated with a live range [i, j]
• Builds on abstract interpretation
– build a Control-Flow Graph (CFG) for each filter
– abstract interpretation of memory operations
Component Analysis
9
University of Iowa | Mobile Sensing Laboratory
• Abstract interpretation of memory operations
– memory counter (MC) – relative order of operation
– indexes of current push (out) and pop (in)
– live range for each input (LIN) and output (LOUT) element
• Indexes and live ranges represented as intervals
• Subset of rules for determining live ranges:
Component Analysis
10
MC, out, LOUT
LOUT [out]⊔ MC, out++, MC++
push
MC, in, LIN
LIN[in]⊔ MC, in++, MC++
pop
(MC1, in1, out1) (MC2, in2, out2)
(MC=max(MC1,MC2), in= in1 ⊔ in2, out=out1 ⊔ out2)
join
University of Iowa | Mobile Sensing Laboratory | 11
Example of Component Analysis
[0,0] ∅ ∅ExampleLIN LOUT
0 0 1
MC, LIN, in
LIN[in]⊔ MC, in++, MC++
pop
RULE:
STATE:
MC 0
in 0 0
out 0 0
MC 1
in 1 1
out 0 0
CFG:
LIN[0]=LIN[0]⊔[0,0]
University of Iowa | Mobile Sensing Laboratory | 12
Example of Component Analysis
[0,0] [1,1] ∅ExampleLIN LOUT
0 0 1
RULE:
STATE:
MC 1
in 1 1
out 0 0
MC 2
in 1 1
out 1 1
CFG:
LOUT[0]=LOUT[0]⊔[1,1]
MC, LOUT, out
LOUT [out]⊔ MC, out++, MC++
push
University of Iowa | Mobile Sensing Laboratory | 13
Example of Component Analysis
[0,0] [1,1] ∅ExampleLIN LOUT
0 0 1
RULE:
STATE:
MC 1
in 1 1
out 0 0
MC 2
in 1 1
out 0 1
CFG:
MC 2
in 1 1
out 1 1
(MC1, in1, out1) (MC2, in2, out2)
(MC=max(MC1,MC2),
in= in1 ⊔ in2, out=out1 ⊔ out2)
join
University of Iowa | Mobile Sensing Laboratory | 14
Example of Component Analysis
[0,0] [1,1] [2,2]ExampleLIN LOUT
0 0 [0,1]
RULE:
STATE:
MC 2
in 1 1
out 0 1
MC 3
in 1 1
out 1 2
CFG:
LOUT[0,1] =LOUT[0,1]⊔[2,2]
MC, LOUT, out
LOUT [out]⊔ MC, out++, MC++
push
University of Iowa | Mobile Sensing Laboratory
• Component analysis constructs a memory fragment
– captures live ranges for temporal reuse
– captures location sharing edges
• Whole program analysis constructs a memory graph
– stitches together memory fragments
– simulates the schedule to
• connect location sharing edges into paths and
• extend live ranges with the phase number and invocation index
• Our approach:
– analysis is precise when there is no input dependency
– otherwise, it is a sound approximation
Whole Program Analysis
15
University of Iowa | Mobile Sensing Laboratory
B
• Empirical insights
– split-joins can be eliminated for manipulating location shared
elements
– a filter usually can reuse its input memory
• Heuristic approaches to resolving temporal reuse conflicts
Memory Layout
16
A
B
A
0
0
0
A B other comps A memory B memory
0
0 0
No conflict Append on Conflict (AoC) Insert-in-Place (IP)
B
A
A
University of Iowa | Mobile Sensing Laboratory
• Intel x86_64 on Mac OS X 10.10.3
– 3GHz Intel Xeon CPU E5-1680 v2.
– 32KB L1 instruction + 32KB L1 data caches
– 256KB L2 + 25MB L3 caches
• StreamIt Compiler
– baseline default settings without optimizations
– enabled cache optimizations with –cacheopt
– gcc –O3 to compile generated C/C++ code
• 11 micro benchmarks from StreamIt
• 3 macro benchmarks from real MSAs
– BeepBeep [Peng, C., et al. 2007],
– MFCC and Crowd [Xu, C., et al. 2013]
Experimental Setup
17
Evaluation
University of Iowa | Mobile Sensing Laboratory
– ESMS reduces both channel buffer sizes and the number
memory operations from splitters, joiners and reordering filters
Memory Usage on Intel x86_64
18
45% to 96% reductions73% reductions on average
Evaluation
University of Iowa | Mobile Sensing Laboratory
– Compared with baseline StreamIt
– The average speedup of AA, AoC, and IP are 3, 3.1, and 3 while the average
speedup of CacheOpt is merely 1.07.
– ESMS improves the performance by eliminating unnecessary memory
operations and reducing cache/memory references.
Speedup on Intel x86_64
19
Evaluation
University of Iowa | Mobile Sensing Laboratory
• Static memory management is effective for stream languages
– whole program memory behaviors may be characterized
– both location and temporal sharing opportunities are exploited
– performance improvement due to fewer memory operations
and references
• ESMS provides significant performance improvements
– 45% to 96% data size reduction
– 73% code size reduction
– 3X speedup
Conclusions
20
University of Iowa | Mobile Sensing Laboratory
• National Science Foundation (NeTs grant #1144664 )
• Carver Foundation (grant #14-43555 )
Acknowledgements
21
CSense Toolkit
University of Iowa | Mobile Sensing Laboratory
Questions?
Thank You
22
Ad

More Related Content

Similar to Static Memory Management for Efficient Mobile Sensing Applications (20)

Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...
Liming Zhu
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
Claire Le Goues
 
Parallel machines flinkforward2017
Parallel machines flinkforward2017Parallel machines flinkforward2017
Parallel machines flinkforward2017
Nisha Talagala
 
22cggggffhhfdffgv091F0014 FINAL PPT-1.pptx
22cggggffhhfdffgv091F0014 FINAL PPT-1.pptx22cggggffhhfdffgv091F0014 FINAL PPT-1.pptx
22cggggffhhfdffgv091F0014 FINAL PPT-1.pptx
andirajukeshavakrish
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
SAIL_QU
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
DataWorks Summit/Hadoop Summit
 
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
adil raja
 
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
adil raja
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
Lightbend
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
HostedbyConfluent
 
Model-based programming and AI-assisted software development
Model-based programming and AI-assisted software developmentModel-based programming and AI-assisted software development
Model-based programming and AI-assisted software development
Eficode
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Lionel Briand
 
OSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchainOSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchain
CARLOS III UNIVERSITY OF MADRID
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
Bonnie Hurwitz
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
Liming Zhu
 
Hybrid Knowledge Bases for Real-Time Robotic Reasoning
Hybrid Knowledge Bases for Real-Time Robotic ReasoningHybrid Knowledge Bases for Real-Time Robotic Reasoning
Hybrid Knowledge Bases for Real-Time Robotic Reasoning
Hassan Rifky
 
John W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final PresentationJohn W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final Presentation
John Vinti
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Lightbend
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Slide_N
 
Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...
Liming Zhu
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
Claire Le Goues
 
Parallel machines flinkforward2017
Parallel machines flinkforward2017Parallel machines flinkforward2017
Parallel machines flinkforward2017
Nisha Talagala
 
22cggggffhhfdffgv091F0014 FINAL PPT-1.pptx
22cggggffhhfdffgv091F0014 FINAL PPT-1.pptx22cggggffhhfdffgv091F0014 FINAL PPT-1.pptx
22cggggffhhfdffgv091F0014 FINAL PPT-1.pptx
andirajukeshavakrish
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
SAIL_QU
 
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
adil raja
 
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
adil raja
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
Lightbend
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
HostedbyConfluent
 
Model-based programming and AI-assisted software development
Model-based programming and AI-assisted software developmentModel-based programming and AI-assisted software development
Model-based programming and AI-assisted software development
Eficode
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Lionel Briand
 
OSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchainOSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchain
CARLOS III UNIVERSITY OF MADRID
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
Bonnie Hurwitz
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
Liming Zhu
 
Hybrid Knowledge Bases for Real-Time Robotic Reasoning
Hybrid Knowledge Bases for Real-Time Robotic ReasoningHybrid Knowledge Bases for Real-Time Robotic Reasoning
Hybrid Knowledge Bases for Real-Time Robotic Reasoning
Hassan Rifky
 
John W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final PresentationJohn W. Vinti Particle Tracker Final Presentation
John W. Vinti Particle Tracker Final Presentation
John Vinti
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Lightbend
 
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Slide_N
 

Recently uploaded (20)

Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Ad

Static Memory Management for Efficient Mobile Sensing Applications

  • 1. University of Iowa | Mobile Sensing Laboratory Static Memory Management for Efficient Mobile Sensing Applications EMSOFT 2015 Farley Lai, Daniel Schmidt, Octav Chipara Department of Computer Science
  • 2. University of Iowa | Mobile Sensing Laboratory • A class of applications that process continuous input data streams and may produce continuous output streams – real-time processing – efficient resource management Emerging Mobile Sensing Applications 2 Speaker Models Speech Recording VAD Feature Extraction HTTP Upload Speaker Identifier Introduction Sensing Stream Processing
  • 3. University of Iowa | Mobile Sensing Laboratory • Workload: stream operations on frames of samples – e.g., windowing, splitting, or appending – stream operation tend to be memory intensive • Goal: implement stream operations efficiently – reduce memory footprint – reduce number of memory accesses • Challenges: – handle complex interaction between components – avoid unnecessary memory copies – enable data sharing between components The Memory Management Challenge 3 Introduction
  • 4. University of Iowa | Mobile Sensing Laboratory • Dynamic memory management – specialized data structures to implement memory management • e.g., SigSeg [Girod, et al. 2008] – linked list of buffered samples – a level of indirection in accessing streaming data • Static memory management – no runtime overhead – requires precise knowledge of the variable live ranges • difficult to achieve in complex applications • must be time-efficient to be included in compilers Approaches to Memory Management 4 Introduction [Girod2008] L. Girod, Y. Mei, R. Newton, S. Rost, A. Thiagarajan, H. Balakrishnan, and S. Madden, “XStream: a Signal-Oriented Data Stream Management System,” in ICDE, 2008.
  • 5. University of Iowa | Mobile Sensing Laboratory • Application model • Static analysis • Memory layout • Evaluation • Conclusions Outline 5
  • 6. University of Iowa | Mobile Sensing Laboratory • StreamIt – synchronous data flow (SDF) language – application = graph of filters connected with FIFO channels • limited memory operations: pop(), peek(), and push() • known consumption and production rates A Model for Stream Applications 6 pop peek push Filter::work() INPUT: OUTPUT:
  • 7. University of Iowa | Mobile Sensing Laboratory • StreamIt – synchronous data flow language – applications are constructed hierarchically • pipeline of streams • split and joins (splitter and joiner) – pass-by-value semantics • naïve implementation would incur significant number of copies A Model for Stream Applications 7 LPF2 Source Duplicate LPF1 Subtract Sink Round-Robin
  • 8. University of Iowa | Mobile Sensing Laboratory • SDFs may be executed in a cyclo-static schedule – the complete memory behavior of the program may be observed within one execution of the schedule • Our solution: static analysis + memory layout Insight 8 LPF2 Source Duplicate LPF1 Subtract Sink RoundRobin Source,3 DUP, 3 LPF1,1 LPF2,1 Source,1 DUP, 1 LPF1,1 LPF2,1 RR,1 Sub,1 Sink INIT PHASE: STEADY PHASE: RR,1 Sub,1 Sink
  • 9. University of Iowa | Mobile Sensing Laboratory • Location Sharing – an output element is pushed from an unmodified input element – each I/O element is associated with a pop/push index • Temporal Sharing – an output element reuses the input element storage – each I/O element is associated with a live range [i, j] • Builds on abstract interpretation – build a Control-Flow Graph (CFG) for each filter – abstract interpretation of memory operations Component Analysis 9
  • 10. University of Iowa | Mobile Sensing Laboratory • Abstract interpretation of memory operations – memory counter (MC) – relative order of operation – indexes of current push (out) and pop (in) – live range for each input (LIN) and output (LOUT) element • Indexes and live ranges represented as intervals • Subset of rules for determining live ranges: Component Analysis 10 MC, out, LOUT LOUT [out]⊔ MC, out++, MC++ push MC, in, LIN LIN[in]⊔ MC, in++, MC++ pop (MC1, in1, out1) (MC2, in2, out2) (MC=max(MC1,MC2), in= in1 ⊔ in2, out=out1 ⊔ out2) join
  • 11. University of Iowa | Mobile Sensing Laboratory | 11 Example of Component Analysis [0,0] ∅ ∅ExampleLIN LOUT 0 0 1 MC, LIN, in LIN[in]⊔ MC, in++, MC++ pop RULE: STATE: MC 0 in 0 0 out 0 0 MC 1 in 1 1 out 0 0 CFG: LIN[0]=LIN[0]⊔[0,0]
  • 12. University of Iowa | Mobile Sensing Laboratory | 12 Example of Component Analysis [0,0] [1,1] ∅ExampleLIN LOUT 0 0 1 RULE: STATE: MC 1 in 1 1 out 0 0 MC 2 in 1 1 out 1 1 CFG: LOUT[0]=LOUT[0]⊔[1,1] MC, LOUT, out LOUT [out]⊔ MC, out++, MC++ push
  • 13. University of Iowa | Mobile Sensing Laboratory | 13 Example of Component Analysis [0,0] [1,1] ∅ExampleLIN LOUT 0 0 1 RULE: STATE: MC 1 in 1 1 out 0 0 MC 2 in 1 1 out 0 1 CFG: MC 2 in 1 1 out 1 1 (MC1, in1, out1) (MC2, in2, out2) (MC=max(MC1,MC2), in= in1 ⊔ in2, out=out1 ⊔ out2) join
  • 14. University of Iowa | Mobile Sensing Laboratory | 14 Example of Component Analysis [0,0] [1,1] [2,2]ExampleLIN LOUT 0 0 [0,1] RULE: STATE: MC 2 in 1 1 out 0 1 MC 3 in 1 1 out 1 2 CFG: LOUT[0,1] =LOUT[0,1]⊔[2,2] MC, LOUT, out LOUT [out]⊔ MC, out++, MC++ push
  • 15. University of Iowa | Mobile Sensing Laboratory • Component analysis constructs a memory fragment – captures live ranges for temporal reuse – captures location sharing edges • Whole program analysis constructs a memory graph – stitches together memory fragments – simulates the schedule to • connect location sharing edges into paths and • extend live ranges with the phase number and invocation index • Our approach: – analysis is precise when there is no input dependency – otherwise, it is a sound approximation Whole Program Analysis 15
  • 16. University of Iowa | Mobile Sensing Laboratory B • Empirical insights – split-joins can be eliminated for manipulating location shared elements – a filter usually can reuse its input memory • Heuristic approaches to resolving temporal reuse conflicts Memory Layout 16 A B A 0 0 0 A B other comps A memory B memory 0 0 0 No conflict Append on Conflict (AoC) Insert-in-Place (IP) B A A
  • 17. University of Iowa | Mobile Sensing Laboratory • Intel x86_64 on Mac OS X 10.10.3 – 3GHz Intel Xeon CPU E5-1680 v2. – 32KB L1 instruction + 32KB L1 data caches – 256KB L2 + 25MB L3 caches • StreamIt Compiler – baseline default settings without optimizations – enabled cache optimizations with –cacheopt – gcc –O3 to compile generated C/C++ code • 11 micro benchmarks from StreamIt • 3 macro benchmarks from real MSAs – BeepBeep [Peng, C., et al. 2007], – MFCC and Crowd [Xu, C., et al. 2013] Experimental Setup 17 Evaluation
  • 18. University of Iowa | Mobile Sensing Laboratory – ESMS reduces both channel buffer sizes and the number memory operations from splitters, joiners and reordering filters Memory Usage on Intel x86_64 18 45% to 96% reductions73% reductions on average Evaluation
  • 19. University of Iowa | Mobile Sensing Laboratory – Compared with baseline StreamIt – The average speedup of AA, AoC, and IP are 3, 3.1, and 3 while the average speedup of CacheOpt is merely 1.07. – ESMS improves the performance by eliminating unnecessary memory operations and reducing cache/memory references. Speedup on Intel x86_64 19 Evaluation
  • 20. University of Iowa | Mobile Sensing Laboratory • Static memory management is effective for stream languages – whole program memory behaviors may be characterized – both location and temporal sharing opportunities are exploited – performance improvement due to fewer memory operations and references • ESMS provides significant performance improvements – 45% to 96% data size reduction – 73% code size reduction – 3X speedup Conclusions 20
  • 21. University of Iowa | Mobile Sensing Laboratory • National Science Foundation (NeTs grant #1144664 ) • Carver Foundation (grant #14-43555 ) Acknowledgements 21 CSense Toolkit
  • 22. University of Iowa | Mobile Sensing Laboratory Questions? Thank You 22

Editor's Notes

  • #2: My name is Farley from the University of Iowa in the US. Daniel just graduated and he wrote some of the benchmarks. Prof. Chipara contributed significant ideas to this work and supervised the research. I am going to present our static memory management for mobile sensing applications.
  • #3: A mobile sensing application needs to process continuous input and output data streams. High performance and efficient resource management are both important. Such applications usually consist of the sensing and the stream processing phases. Here is the example of the speaker identifier application, The sensing is the speech recording, while the stream processing involves voice activity detection and feature extraction The features are uploaded to a remote server for identification Unlike the sensing, the stream processing can be arbitrarily complex and compute intensive Therefore, it’s essential to develop compiler optimizations for streaming
  • #4: In term of the performance bottleneck of stream processing, stream operations such as windowing, splitting, and appending tend to be memory intensive Therefore, the goal is to implement stream operations efficiently. This involves reducing the memory footprint and the number of memory accesses. To achieve this, We have to handle complex component interactions Avoid unnecessary memory copies And enable data sharing between components
  • #5: Traditionally, the memory management can be dynamic or static. Dynamic memory management can simply rely on the garbage collection Or some runtime analysis and data structures For example, the previous work SigSeg uses a linked list like structue to manage buffered samples However, runtime management overhead due to a level for indirection is inevitable On the other hand, static memory management does not suffer runtime overhead. But it requires precise knowledge of the variable live ranges. This is difficult in complex applications Not to mention this analysis must be efficient for practical use
  • #6: For the remainder of the talk, I will first give an overview of the application model that our optimization is applied to Next, I will go through the static analysis and the memory layout Followed by the evaluation and conclusions
  • #7: We adopt a well-defined synchronous data-flow language called StreamIt from MIT to facilitate our optimization. In StreamIt, a program is represented as a graph of filters connected with FIFO channel A filter has work() function and serves as the basic processing unit. A filter must use peek() and pop() to access its input channel, and push() to access its output channel. Besides, the data consumption and production rates in one filter invocation are known and fixed at compile time.
  • #8: To compose a a complex stream program, StreamIt provides hierarchical stream constructs Including pipelines and splitjoins Here a stream a placeholder, which can be a filter, or another pipeline or splitjoin. A pipeline compose a sequence of stream. A splitjoin allows for parallel data flow branches. The data exchange follows the pass-by-value semantics. A naiive implementation would incur significant memory copy overhead. Here is a band pass filter example. The top level stream is always a pipeline. There is a splitjoin that splits the source input for the downstream low pass filters to process The results are joined in a round-robin fashion and subtracted to produce the final output
  • #9: The insight of using StreamIt is the model of computation follows a static schedule that describes the filter invocation order and times. A schedule may have an optional initialization phase that executes only once. And a steady state phase that can repeats forever It is possible to capture the complete memory behavior in one schedule iteration. Our solution is to first apply the static analysis to the schedule and then generate an efficient memory layout. Here is a schedule example of the band pass filter. In the init phase, only the source and the dup splitter exectues 3 times. The other filter executes once. In the steady phase, all the filters execute once.
  • #10: Our static analysis consists of the component analysis and the whole program analysis. The component analysis analyzes the work() function per filter. The goal is to capture location and temporal sharing opportunities. The location sharing associates the unmodified input elements with the corresponding output elements. The input and output elements are identified by their respective pop and push indices. The temporal sharing allows an output element to reuse an input element storage. The input and output element live ranges must be captured for safe reuse. This framework builds on abstract interpretation and data-flow analysis. Each filter work() function is represented as a CFG for analysis. Then we describe each memory operation by abstract interpretation.
  • #11: The abstract interpretation includes the memory counter, element push, pop indices, and live ranges. The memory counter is the relative order of operation. We use out and in to denote the push and pop indices. We use L to denote the element live range. Indices and live ranges are viewed as intervals for set operators.
  • #12: Let’s go through an example CFG. The MC is initialized to zero. The input element live range is initialized to the interval [0, 0]. The output element live range is initialized to be empty. Then, a pop() is applied first. The input element live range union with MC=0 is still [0,0]. After that, the MC and the input index are incremented.
  • #13: Next, we get to the push(0) in the right branch. The output element live range is evaluated to be [1,1] because MC=1 Then, the MC and the push index are incremented.
  • #14: Next, we get to the join block and merge the information from both branches. We take the maximum MC. The union of the pop indices is the same. The union of the push indices becomes [0,1].
  • #15: There is one last push(x). Again, its live range is evaluated to be [2,2] because the current MC is 2. But its push index is between 0 and 1. This implies the memory behavior is input dependent and non-deterministic. The compiler needs to take a conservative estimation.
  • #16: After the component analysis is done, the live ranges and location sharing edges between input and output elements are saved in a memory fragment. The whole program analysis then constructs a memory graph by stitching the fragments in one schedule iteration. The location sharing edges between filters are connected into paths. The live range of the location shared element takes the union of the live ranges along the path. Now, we scale the live ranges to include all the input and output elements in different schedule phases and invocation indices. The intuition is combine the phase number and the invocation index with the MC in the live ranges. As a result, given no input dependency, our analysis is able to characterize the complete memory behavior of the entire program because there is no pointer aliasing in StreamIt and it is guaranteed to terminate in one schedule iteration. If there is input dependency, our compiler simply enforces FIFO access and enlarge the memory layout for safety.
  • #17: Based on the static analysis, it is straightforward to generate the efficient memory layout following the data sharing insights. First , location sharing avoids memory copies due to splitjoins and reordering filters Second, temporal sharing allows to reuse the the input memory However, we still need to resolve temporal reuse conflicts due to non-empty live range intersections. Here is the simulation for filter B to decide its output memory layout. If all the input elements are temporally reusable, output B simply reuses its input from A as shown in the left figure. Otherwise, we need to resolve the live range conflicts. Currently we offer three strategies. The Always Append strategy appends to enlarge the layout regardless of temporal reuses as shown in the central figure. The Append on Conflict strategy appends to enlarge the layout as long as there is any live range conflict. Therefore, it acts either as the left figure or the central figure. The Insert in Place tries to reuse as much as possible and insert the output by shifting the conflicted region to enlarge the layout. In the right figure, output B reuses until the conflicted region and insert by shifting the region to enlarge the layout.
  • #18: Next, I will present the experimental results on the Intel platform. The results on the ARM Android platform are available in the paper. Our baseline is the default StreamIt compiler without optimizations. We also compare with the StreamIt cache optimization, which increases the number of filter invocations to trade space for cache locality. To make a fair comparison, 11 of the benchmarks are from the StreamIt package. We also implemented 3 macro benchmarks extracted from real mobile sensing applications. The BeepBeep is for audio localization The MFCC is the feature extraction of the speaker identifier The Crowd is for co-located speaker counting
  • #19: Here are the memory usage reductions on the Intel platform. The right figure shows the data size reduction from 45% to 96% compared with the cache optimization. The left figure shows the code size reduction is 73% on average This is because location sharing prevents unnecessary memory copies of shared elements. Therefore, ESMS reduces both channel buffer sizes and the number memory operations from splitters, joiners and reordering filters.
  • #20: Finally, we evaluate the performance speedup against the baseline StreamIt. Overall, the average speedup of ESMS is about 3 While the average speedup of the StreamIt cache optimization is merely 1.07. The StreamIt cache optimization is not applicable to our macro benchmarks because it runs out of memory due to large fine-grained FFT settings. To sum up, ESMS improves the performance by removing unnecessary memory operations and reducing the number of cache/memory references with a smaller working set.
  • #21: The component fragment information is reusable and can be exposed without the source code.
  • #22: We especially thank and acknowledge our funding sources. Now, I think it’s time to take your questions.