Increasingly, organizations are relying on Kafka for mission critical use-cases where high availability and fast recovery times are essential. In particular, enterprise operators need the ability to quickly migrate applications between clusters in order to maintain business continuity during outages. In many cases, out-of-order or missing records are entirely unacceptable. MirrorMaker is a popular tool for replicating topics between clusters, but it has proven inadequate for these enterprise multi-cluster environments. Here we present MirrorMaker 2.0, an upcoming all-new replication engine designed specifically to provide disaster recovery and high availability for Kafka. We describe various replication topologies and recovery strategies using MirrorMaker 2.0 and associated tooling.
Apache Kafka in the Telco Industry (OSS, BSS, OTT, IMS, NFV, Middleware, Main...Kai Wähner
Real-time data streaming is a hot topic in the Telecommunications Industry / Telecom Sector. As telecommunications companies strive to offer high speed, integrated networks with reduced connection times, connect countless devices at reduced latency, and transform the digital experience worldwide, more and more companies are turning to Apache Kafka’s data stream processing solutions to deliver a scalable, real-time infrastructure for OSS and BSS scenarios. Enabling a combination of on-premise data centers, edge processing, and multi-cloud architectures is becoming the new normal in the Telco Industry. This combination is enabling accelerated growth from value-added services delivered over mobile networks.
Join Kai Waehner, Technology Evangelist at Confluent, for this session which explores various telecommunications use cases, including data integration, infrastructure monitoring, data distribution, data processing and business applications. Different architectures and components from the Kafka ecosystem are also discussed.
This talk explores:
- Overcome challenges for building a modern hybrid telco infrastructure
- Build a real time infrastructure to correlate relevant events
- Connect thousands of devices, networks, infrastructures, and people
- Work together with different companies, organisations and business models
- Leverage open source and fully managed solutions from the Apache Kafka ecosystem, Confluent Platform and Confluent Cloud
Disaster Recovery and High Availability with Kafka, SRM and MM2Abdelkrim Hadjidj
In this talk, we will present Streams Replication Manager, a new open source Kafka mirroring solution designed specifically to provide disaster recovery and high availability for Kafka. We will describe and demo various replication topologies and recovery strategies using SRM and associated tooling. Finally, we will provide an update on the ongoing work to make this engine available for the Apache Kafka community as MirrorMaker2 (KIP-382).
From Zero to Hero with Kafka Connect (Robin Moffat, Confluent) Kafka Summit L...confluent
Integrating Apache Kafka with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren’t working. This talk will discuss the key design concepts within Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We’ll do a live demo of building pipelines with Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we’ll go hands-on in methodically diagnosing and resolving common issues encountered with Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Kafka Connect in containers.
Designing a complete ci cd pipeline using argo events, workflow and cd productsJulian Mazzitelli
https://www.youtube.com/watch?v=YmIAatr3Who
Presented at Cloud and AI DevFest GDG Montreal on September 27, 2019.
Are you looking to get more flexibility out of your CICD platform? Interested how GitOps fits into the mix? Learn how Argo CD, Workflows, and Events can be combined to craft custom CICD flows. All while staying Kubernetes native, enabling you to leverage existing observability tooling.
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are sub-second response time and data freshness. Cluster computing frameworks such as Hadoop or Hive/Hbase work well for storing large volumes of data, although they are not optimized for ingesting streaming data and making it available for queries in realtime. Also, long query latencies make these systems sub-optimal choices for powering interactive dashboards and BI use-cases.
In this talk we will present Druid as a complementary solution to existing hadoop based technologies. Druid is an open-source analytics data store, designed from scratch, for OLAP and business intelligence queries over massive data streams. It provides low latency realtime data ingestion and fast sub-second adhoc flexible data exploration queries.
Many large companies are switching to Druid for analytics, and we will cover how druid is able to handle massive data streams and why it is a good fit for BI use cases.
Agenda -
1) Introduction and Ideal Use cases for Druid
2) Data Architecture
3) Streaming Ingestion with Kafka
4) Demo using Druid, Kafka and Superset.
5) Recent Improvements in Druid moving from lambda architecture to Exactly once Ingestion
6) Future Work
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...Salesforce Engineering
Apache HBase is an open source, non-relational, distributed datastore modeled after Google’s Bigtable, that runs on top of the Apache Hadoop Distributed Filesystem and provides low-latency random-access storage for HDFS-based compute platforms like Apache Hadoop and Apache Spark. Apache Phoenix is a high performance relational database layer over HBase optimized for low latency applications. This session will explore how the Data Platform and Services group at Salesforce.com supports teams of application developers accustomed to structured relational data access, while surfacing additional advantages of the underlying flexible scale-out datastore.
Kafka is a high-throughput, fault-tolerant, scalable platform for building high-volume near-real-time data pipelines. This presentation is about tuning Kafka pipelines for high-performance.
Select configuration parameters and deployment topologies essential to achieve higher throughput and low latency across the pipeline are discussed. Lessons learned in troubleshooting and optimizing a truly global data pipeline that replicates 100GB data under 25 minutes is discussed.
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
Spoilt for Choice – Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka:
Apache Kafka is a de facto standard streaming data processing platform. It is widely deployed as event streaming platform. Part of Kafka is its stream processing API “Kafka Streams”. In addition, the Kafka ecosystem now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax.
This session discusses and demos the pros and cons of Kafka Streams and KSQL to understand when to use which stream processing alternative for continuous stream processing natively on Apache Kafka infrastructures. The end of the session compares the trade-offs of Kafka Streams and KSQL to separate stream processing frameworks such as Apache Flink or Spark Streaming.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments
Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. This session gives an overview of several scenarios that may require multi-cluster solutions and discusses real-world examples with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments and global Kafka.
Key takeaways:
In many scenarios, one Kafka cluster is not enough. Understand different architectures and alternatives for multi-cluster deployments.
Zero data loss and high availability are two key requirements. Understand how to realize this, including trade-offs.
Learn about features and limitations of Kafka for multi cluster deployments
Global Kafka and mission-critical multi-cluster deployments with zero data loss and high availability became the normal, not an exception.
Kafka is an open-source distributed commit log service that provides high-throughput messaging functionality. It is designed to handle large volumes of data and different use cases like online and offline processing more efficiently than alternatives like RabbitMQ. Kafka works by partitioning topics into segments spread across clusters of machines, and replicates across these partitions for fault tolerance. It can be used as a central data hub or pipeline for collecting, transforming, and streaming data between systems and applications.
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
This presentation introduces Apache Flink, a massively parallel data processing engine which currently undergoes the incubation process at the Apache Software Foundation. Flink's programming primitives are presented and it is shown how easily a distributed PageRank algorithm can be implemented with Flink. Intriguing features such as dedicated memory management, Hadoop compatibility, streaming and automatic optimisation make it an unique system in the world of Big Data processing.
Here are the key steps to create an application from the catalog in the OpenShift web console:
1. Click on "Add to Project" on the top navigation bar and select "Browse Catalog".
2. This will open the catalog page showing available templates. You can search for a template or browse by category.
3. Select the template you want to use, for example Node.js.
4. On the next page you can review the template details and parameters. Fill in any required parameters.
5. Click "Create" to instantiate the template and create the application resources in your current project.
6. OpenShift will then provision the application, including building container images if required.
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureKai Wähner
AWS Data Lake / Lake House + Confluent Cloud for Serverless Apache Kafka. Learn about use cases, architectures, and features.
Data must be continuously collected, processed, and reactively used in applications across the entire enterprise - some in real time, some in batch mode. In other words: As an enterprise becomes increasingly software-defined, it needs a data platform designed primarily for "data in motion" rather than "data at rest."
Apache Kafka is now mainstream when it comes to data in motion! The Kafka API has become the de facto standard for event-driven architectures and event streaming. Unfortunately, the cost of running it yourself is very often too expensive when you add factors like scaling, administration, support, security, creating connectors...and everything else that goes with it. Resources in enterprises are scarce: this applies to both the best team members and the budget.
The cloud - as we all know - offers the perfect solution to such challenges.
Most likely, fully-managed cloud services such as AWS S3, DynamoDB or Redshift are already in use. Now it is time to implement "fully-managed" for Kafka as well - with Confluent Cloud on AWS.
Building a central integration layer that doesn't care where or how much data is coming from.
Implementing scalable data stream processing to gain real-time insights
Leveraging fully managed connectors (like S3, Redshift, Kinesis, MongoDB Atlas & more) to quickly access data
Confluent Cloud in action? Let's show how ao.com made it happen!
Translated with www.DeepL.com/Translator (free version)
The tech talk was gieven by Ranjeeth Kathiresan, Salesforce Senior Software Engineer & Gurpreet Multani, Salesforce Principal Software Engineer in June 2017.
(Stephane Maarek, DataCumulus) Kafka Summit SF 2018
Security in Kafka is a cornerstone of true enterprise production-ready deployment: It enables companies to control access to the cluster and limit risks in data corruption and unwanted operations. Understanding how to use security in Kafka and exploiting its capabilities can be complex, especially as the documentation that is available is aimed at people with substantial existing knowledge on the matter.
This talk will be delivered in a “hero journey” fashion, tracing the experience of an engineer with basic understanding of Kafka who is tasked with securing a Kafka cluster. Along the way, I will illustrate the benefits and implications of various mechanisms and provide some real-world tips on how users can simplify security management.
Attendees of this talk will learn about aspects of security in Kafka, including:
-Encryption: What is SSL, what problems it solves and how Kafka leverages it. We’ll discuss encryption in flight vs. encryption at rest.
-Authentication: Without authentication, anyone would be able to write to any topic in a Kafka cluster, do anything and remain anonymous. We’ll explore the available authentication mechanisms and their suitability for different types of deployment, including mutual SSL authentication, SASL/GSSAPI, SASL/SCRAM and SASL/PLAIN.
-Authorization: How ACLs work in Kafka, ZooKeeper security (risks and mitigations) and how to manage ACLs at scale
The document provides an overview of Red Hat OpenShift Container Platform, including:
- OpenShift provides a fully automated Kubernetes container platform for any infrastructure.
- It offers integrated services like monitoring, logging, routing, and a container registry out of the box.
- The architecture runs everything in pods on worker nodes, with masters managing the control plane using Kubernetes APIs and OpenShift services.
- Key concepts include pods, services, routes, projects, configs and secrets that enable application deployment and management.
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...Flink Forward
Within fintech catching fraudsters is one of the primary opportunities for us to use streaming applications to apply ML models in real-time. This talk will be a review of our journey to bring fraud decisioning to our tellers at Capital One using Kafka, Flink and AWS Lambda. We will share our learnings and experiences to common problems such as custom windowing, breaking down a monolith app to small queryable state apps, feature engineering with Jython, dealing with back pressure from combining two disparate streams, model/feature validation in a regulatory environment, and running Flink jobs on Kubernetes.
This session is focused on the Hashicorp vault which is a secret management tool. We can manage secrets for 2-3 environments but what if we have more than 10 environments, then it will become a very painful task to manage them when secrets are dynamic and need to be rotated after some time. Hashicorp vault can easily manage secrets for both static and dynamic also it can help in secret rotations.
Presentation at Strata Data Conference 2018, New York
The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.
Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
The document discusses Apache Phoenix, which transforms HBase into a SQL database. Phoenix turns HBase into a SQL database by providing a query engine, metadata repository, and embedded JDBC driver to access HBase data. It is the fastest way to access HBase data through techniques like push down query optimization and client-side parallelization. Phoenix also helps HBase scale by allowing multiple tables to share the same physical HBase table through updateable views and multi-tenant tables and views.
The document discusses Orion, the son of Poseidon in Greek mythology. It describes how Orion could walk on water and was blinded as punishment for misbehaving on an island. Orion then stumbled upon Hephaestus's forge on Lemnos, where Hephaestus's servant Cedalion guided Orion and carried him on his shoulders to the east, where the sun healed Orion's blindness. The passage references Isaac Newton's quote about standing on the shoulders of giants to see further.
apidays LIVE Singapore - Next-generation microservice architecture based on A...apidays
apidays LIVE Singapore 2021 - Digitisation, Connected Services and Embedded Finance
April 21 & 22, 2021
Next-generation microservice architecture based on Apache APISIX
Ming Wen, Apache APISIX PMC Chair at Apache Software Foundation
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...confluent
In the Apache Kafka world, there is such a great diversity of open source tools available (I counted over 50!) that it’s easy to get lost. Over the years I have dealt with Kafka, I have learned to particularly enjoy a few of them that save me a tremendous amount of time over performing manual tasks. I will be sharing my experience and doing live demos of my favorite Kafka tools, so that you too can hopefully increase your productivity and efficiency when managing and administering Kafka. Come learn about the latest and greatest tools for CLI, UI, Replication, Management, Security, Monitoring, and more!
Apache Kafka in the Airline, Aviation and Travel IndustryKai Wähner
Aviation and travel are notoriously vulnerable to social, economic, and political events, as well as the ever-changing expectations of consumers. Coronavirus is just a piece of the challenge.
This presentation explores use cases, architectures, and references for Apache Kafka as event streaming technology in the aviation industry, including airline, airports, global distribution systems (GDS), aircraft manufacturers, and more.
Examples include Lufthansa, Singapore Airlines, Air France Hop, Amadeus, and more. Technologies include Kafka, Kafka Connect, Kafka Streams, ksqlDB, Machine Learning, Cloud, and more.
This document provides an overview and best practices for operating HBase clusters. It discusses HBase and Hadoop architecture, how to set up an HBase cluster including Zookeeper and region servers, high availability considerations, scaling the cluster, backup and restore processes, and operational best practices around hardware, disks, OS, automation, load balancing, upgrades, monitoring and alerting. It also includes a case study of a 110 node HBase cluster.
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
Spoilt for Choice – Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka:
Apache Kafka is a de facto standard streaming data processing platform. It is widely deployed as event streaming platform. Part of Kafka is its stream processing API “Kafka Streams”. In addition, the Kafka ecosystem now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax.
This session discusses and demos the pros and cons of Kafka Streams and KSQL to understand when to use which stream processing alternative for continuous stream processing natively on Apache Kafka infrastructures. The end of the session compares the trade-offs of Kafka Streams and KSQL to separate stream processing frameworks such as Apache Flink or Spark Streaming.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments
Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. This session gives an overview of several scenarios that may require multi-cluster solutions and discusses real-world examples with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments and global Kafka.
Key takeaways:
In many scenarios, one Kafka cluster is not enough. Understand different architectures and alternatives for multi-cluster deployments.
Zero data loss and high availability are two key requirements. Understand how to realize this, including trade-offs.
Learn about features and limitations of Kafka for multi cluster deployments
Global Kafka and mission-critical multi-cluster deployments with zero data loss and high availability became the normal, not an exception.
Kafka is an open-source distributed commit log service that provides high-throughput messaging functionality. It is designed to handle large volumes of data and different use cases like online and offline processing more efficiently than alternatives like RabbitMQ. Kafka works by partitioning topics into segments spread across clusters of machines, and replicates across these partitions for fault tolerance. It can be used as a central data hub or pipeline for collecting, transforming, and streaming data between systems and applications.
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
This presentation introduces Apache Flink, a massively parallel data processing engine which currently undergoes the incubation process at the Apache Software Foundation. Flink's programming primitives are presented and it is shown how easily a distributed PageRank algorithm can be implemented with Flink. Intriguing features such as dedicated memory management, Hadoop compatibility, streaming and automatic optimisation make it an unique system in the world of Big Data processing.
Here are the key steps to create an application from the catalog in the OpenShift web console:
1. Click on "Add to Project" on the top navigation bar and select "Browse Catalog".
2. This will open the catalog page showing available templates. You can search for a template or browse by category.
3. Select the template you want to use, for example Node.js.
4. On the next page you can review the template details and parameters. Fill in any required parameters.
5. Click "Create" to instantiate the template and create the application resources in your current project.
6. OpenShift will then provision the application, including building container images if required.
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureKai Wähner
AWS Data Lake / Lake House + Confluent Cloud for Serverless Apache Kafka. Learn about use cases, architectures, and features.
Data must be continuously collected, processed, and reactively used in applications across the entire enterprise - some in real time, some in batch mode. In other words: As an enterprise becomes increasingly software-defined, it needs a data platform designed primarily for "data in motion" rather than "data at rest."
Apache Kafka is now mainstream when it comes to data in motion! The Kafka API has become the de facto standard for event-driven architectures and event streaming. Unfortunately, the cost of running it yourself is very often too expensive when you add factors like scaling, administration, support, security, creating connectors...and everything else that goes with it. Resources in enterprises are scarce: this applies to both the best team members and the budget.
The cloud - as we all know - offers the perfect solution to such challenges.
Most likely, fully-managed cloud services such as AWS S3, DynamoDB or Redshift are already in use. Now it is time to implement "fully-managed" for Kafka as well - with Confluent Cloud on AWS.
Building a central integration layer that doesn't care where or how much data is coming from.
Implementing scalable data stream processing to gain real-time insights
Leveraging fully managed connectors (like S3, Redshift, Kinesis, MongoDB Atlas & more) to quickly access data
Confluent Cloud in action? Let's show how ao.com made it happen!
Translated with www.DeepL.com/Translator (free version)
The tech talk was gieven by Ranjeeth Kathiresan, Salesforce Senior Software Engineer & Gurpreet Multani, Salesforce Principal Software Engineer in June 2017.
(Stephane Maarek, DataCumulus) Kafka Summit SF 2018
Security in Kafka is a cornerstone of true enterprise production-ready deployment: It enables companies to control access to the cluster and limit risks in data corruption and unwanted operations. Understanding how to use security in Kafka and exploiting its capabilities can be complex, especially as the documentation that is available is aimed at people with substantial existing knowledge on the matter.
This talk will be delivered in a “hero journey” fashion, tracing the experience of an engineer with basic understanding of Kafka who is tasked with securing a Kafka cluster. Along the way, I will illustrate the benefits and implications of various mechanisms and provide some real-world tips on how users can simplify security management.
Attendees of this talk will learn about aspects of security in Kafka, including:
-Encryption: What is SSL, what problems it solves and how Kafka leverages it. We’ll discuss encryption in flight vs. encryption at rest.
-Authentication: Without authentication, anyone would be able to write to any topic in a Kafka cluster, do anything and remain anonymous. We’ll explore the available authentication mechanisms and their suitability for different types of deployment, including mutual SSL authentication, SASL/GSSAPI, SASL/SCRAM and SASL/PLAIN.
-Authorization: How ACLs work in Kafka, ZooKeeper security (risks and mitigations) and how to manage ACLs at scale
The document provides an overview of Red Hat OpenShift Container Platform, including:
- OpenShift provides a fully automated Kubernetes container platform for any infrastructure.
- It offers integrated services like monitoring, logging, routing, and a container registry out of the box.
- The architecture runs everything in pods on worker nodes, with masters managing the control plane using Kubernetes APIs and OpenShift services.
- Key concepts include pods, services, routes, projects, configs and secrets that enable application deployment and management.
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...Flink Forward
Within fintech catching fraudsters is one of the primary opportunities for us to use streaming applications to apply ML models in real-time. This talk will be a review of our journey to bring fraud decisioning to our tellers at Capital One using Kafka, Flink and AWS Lambda. We will share our learnings and experiences to common problems such as custom windowing, breaking down a monolith app to small queryable state apps, feature engineering with Jython, dealing with back pressure from combining two disparate streams, model/feature validation in a regulatory environment, and running Flink jobs on Kubernetes.
This session is focused on the Hashicorp vault which is a secret management tool. We can manage secrets for 2-3 environments but what if we have more than 10 environments, then it will become a very painful task to manage them when secrets are dynamic and need to be rotated after some time. Hashicorp vault can easily manage secrets for both static and dynamic also it can help in secret rotations.
Presentation at Strata Data Conference 2018, New York
The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.
Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
The document discusses Apache Phoenix, which transforms HBase into a SQL database. Phoenix turns HBase into a SQL database by providing a query engine, metadata repository, and embedded JDBC driver to access HBase data. It is the fastest way to access HBase data through techniques like push down query optimization and client-side parallelization. Phoenix also helps HBase scale by allowing multiple tables to share the same physical HBase table through updateable views and multi-tenant tables and views.
The document discusses Orion, the son of Poseidon in Greek mythology. It describes how Orion could walk on water and was blinded as punishment for misbehaving on an island. Orion then stumbled upon Hephaestus's forge on Lemnos, where Hephaestus's servant Cedalion guided Orion and carried him on his shoulders to the east, where the sun healed Orion's blindness. The passage references Isaac Newton's quote about standing on the shoulders of giants to see further.
apidays LIVE Singapore - Next-generation microservice architecture based on A...apidays
apidays LIVE Singapore 2021 - Digitisation, Connected Services and Embedded Finance
April 21 & 22, 2021
Next-generation microservice architecture based on Apache APISIX
Ming Wen, Apache APISIX PMC Chair at Apache Software Foundation
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...confluent
In the Apache Kafka world, there is such a great diversity of open source tools available (I counted over 50!) that it’s easy to get lost. Over the years I have dealt with Kafka, I have learned to particularly enjoy a few of them that save me a tremendous amount of time over performing manual tasks. I will be sharing my experience and doing live demos of my favorite Kafka tools, so that you too can hopefully increase your productivity and efficiency when managing and administering Kafka. Come learn about the latest and greatest tools for CLI, UI, Replication, Management, Security, Monitoring, and more!
Apache Kafka in the Airline, Aviation and Travel IndustryKai Wähner
Aviation and travel are notoriously vulnerable to social, economic, and political events, as well as the ever-changing expectations of consumers. Coronavirus is just a piece of the challenge.
This presentation explores use cases, architectures, and references for Apache Kafka as event streaming technology in the aviation industry, including airline, airports, global distribution systems (GDS), aircraft manufacturers, and more.
Examples include Lufthansa, Singapore Airlines, Air France Hop, Amadeus, and more. Technologies include Kafka, Kafka Connect, Kafka Streams, ksqlDB, Machine Learning, Cloud, and more.
This document provides an overview and best practices for operating HBase clusters. It discusses HBase and Hadoop architecture, how to set up an HBase cluster including Zookeeper and region servers, high availability considerations, scaling the cluster, backup and restore processes, and operational best practices around hardware, disks, OS, automation, load balancing, upgrades, monitoring and alerting. It also includes a case study of a 110 node HBase cluster.
The document discusses several key factors for optimizing HBase performance including:
1. Reads and writes compete for disk, network, and thread resources so they can cause bottlenecks.
2. Memory allocation needs to balance space for memstores, block caching, and Java heap usage.
3. The write-ahead log can be a major bottleneck and increasing its size or number of logs can improve write performance.
4. Flushes and compactions need to be tuned to avoid premature flushes causing "compaction storms".
Designing Scalable Data Warehouse Using MySQLVenu Anuganti
The document discusses designing scalable data warehouses using MySQL. It covers topics like the role of MySQL in data warehousing and analytics, typical data warehouse architectures, scaling out MySQL, and limitations of MySQL for large datasets or as a scalable warehouse solution. Real-time analytics are also discussed, noting the challenges of performance and scalability for near real-time analytics.
This document discusses tuning HBase and HDFS for performance and correctness. Some key recommendations include:
- Enable HDFS sync on close and sync behind writes for correctness on power failures.
- Tune HBase compaction settings like blockingStoreFiles and compactionThreshold based on whether the workload is read-heavy or write-heavy.
- Size RegionServer machines based on disk size, heap size, and number of cores to optimize for the workload.
- Set client and server RPC chunk sizes like hbase.client.write.buffer to 2MB to maximize network throughput.
- Configure various garbage collection settings in HBase like -Xmn512m and -XX:+UseCMSInit
The document discusses Salesforce's plans to enhance its streaming and enterprise messaging capabilities. It describes the limitations of the current streaming API and previews new "durable streaming" functionality that will allow events to be replayed. It also outlines Salesforce's vision for an "Enterprise Messaging" platform using an event-driven architecture and "Conduit" API to reliably deliver data changes in real time or from a replay log. A roadmap is provided showing phased rollout of these capabilities through 2016.
HBase is a scalable NoSQL database modeled after Google's Bigtable. It is built on top of HDFS for storage, and uses Zookeeper for distributed coordination and failover. Data in HBase is stored in tables and sorted by row key, with columns grouped into families and cells containing values and timestamps. HBase tables are split into regions for scalability and fault tolerance, with a master server coordinating region locations across multiple region servers.
This document discusses Bronto's use of HBase for their marketing platform. Some key points:
- Bronto uses HBase for high volume scenarios, realtime data access, batch processing, and as a staging area for HDFS.
- HBase tables at Bronto are designed with the read/write patterns and necessary queries in mind. Row keys and column families are structured to optimize for these access patterns.
- Operations of HBase at scale require tuning of JVM settings, monitoring tools, and custom scripts to handle compactions and prevent cascading failures during high load. Table design also impacts operations and needs to account for expected workloads.
Salesforce External Objects for Big DataSumit Sarkar
Transform Salesforce into the system of engagement for your big data. Discuss best practices and lessons learned in accessing external data sets in Hadoop or Spark using Salesforce Connect. Leave the big data sets behind the firewall, and get on demand access for your users to big data insights using external objects with Salesforce Connect.
In this session we will cover:
Intro to Salesforce Connect
Intro to Big Data Landscape
How to connect Salesforce to Big Data using External Data Sources
Lessons Learned accessing Big Data using External Objects for native reporting, writes, lookups, search and more
Resources (How to learn more)
HBase In Action - Chapter 04: HBase table designphanleson
HBase In Action - Chapter 04: HBase table design
Learning HBase, Real-time Access to Your Big Data, Data Manipulation at Scale, Big Data, Text Mining, HBase, Deploying HBase
Salesforce for Nonprofits: Turn Big Data into Social ChangeSalesforce.org
Salesforce Analytics Cloud is Analytics for the Rest of Us and leading nonprofits are already showing how big data can help solve the world’s complex problems. Learn how Project 8 is using Analytics Cloud to help ensure that the 8 billion people that will live on this earth in 15 years will have the food, water, and energy they need.
Apache Hadoop and Spark are best-of-breed technologies for distributed processing and storage of very large data sets: Big Data. Join us as we explain how to integrate Salesforce with off-the-shelf big data tools to build flexible applications. You'll also learn how Force.com is evolving in this area and how Big Objects and Data Pipelines will provide Big Data capability within the platform.
Have a lot of data? Using or considering using Apache HBase (part of the Hadoop family) to store your data? Want to have your cake and eat it too? Phoenix is an open source project put out by Salesforce. Join us to learn how you can continue to use SQL, but get the raw speed of native HBase usage through Phoenix.
Analyze billions of records on Salesforce App Cloud with BigObjectSalesforce Developers
Salesforce hosts billions of customer records on Salesforce App Cloud. Making timely decisions on this invaluable data demands a new set of capabilities. From interacting with data in real-time to leveraging a fluid integration with Salesforce Analytics, these capabilities are just around the corner. Join us in this roadmap session to see what the near-future of Big Data on Salesforce App Cloud looks like and how you can benefit from it.
Key Takeaways
- Learn what 100 billion+ records on the Salesforce App Cloud could actually mean to you.
- Understand new services such as AsyncSOQL that can can deliver reliable, resilient query capabilities over your sObjects and BigObjects.
-Gain insights for large scale federated data filtering and aggregation.
-Transform data movement so all your customer records are available across their life cycle.
Intended Audience
This session is for Salesforce Administrators, Developers, Architects and just about anyone who wants to learn more about BigObjects!
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Spark Summit
This document discusses using Spark Streaming to process and normalize log streams in real time from 100k events per second to over 1 million per second. It proposes using RSyslog to collect logs from multiple sources into Kafka, then using Spark Streaming to apply regex matching and extract fields to normalize the data into a structured JSON format and write it to additional Kafka topics for storage and further processing. The solution was able to process 3 billion events per day with less than 20 seconds of end-to-end delay at peak throughput.
Column-Stores vs. Row-Stores: How Different are they Really?Daniel Abadi
The document compares the performance of row-stores and column-stores for data warehousing workloads. It finds that with certain optimizations, the performance difference can be minimized:
A row-store can match the performance of a column-store by vertically partitioning columns and allowing virtual tuple IDs. Removing optimizations from a column-store, like compression and late materialization, causes its performance to degrade to that of a row-store. While column stores are better suited for data warehousing, row-stores can achieve similar performance with improvements to support vertical partitioning and column-specific optimizations.
Spark Streaming allows real-time processing of live data streams. It works by dividing the streaming data into batches called DStreams, which are then processed using Spark's batch API. Common sources of data include Kafka, files, and sockets. Transformations like map, reduce, join and window can be applied to DStreams. Stateful operations like updateStateByKey allow updating persistent state. Checkpointing to reliable storage like HDFS provides fault-tolerance.
Developer burnout is sneaky and can slowly grind at the developer until constant fatigue and lack of motivation become normal. Key signs include realizing you've been doing the same thing for years without enjoyment. Success in open source can lead to taking on too many projects and goals without delegation. To avoid burnout, developers should be gentle with themselves, publish less to communities while still influencing them, delegate tasks, pursue other interests and hobbies, and take breaks if needed. Selectively engaging with others on social media can also help prevent burnout.
Watch this webinar to discover new and updated Salesforce Platform features coming in the Winter '14 Release including:
Force.com Canvas -- Force.com Canvas continues to add useful features such as ability to access a Canvas app from the Chatter Publisher Action, support for the Streaming API along with modified user permissions and SDK field changes.
API Updates -- New features added to SOQL, SOSL, REST API, SOAP API, Chatter API, Metadata API and the Streaming API. Additionally, we continue to make performance improvements to the Bulk API, Tooling API and Analytics REST API.
Visualforce Updates -- Visualforce enhancements in Winter ’14 are focused on improving the experience of developing HTML5 apps, with some additional development tools improvements and other changes.
Developer Console -- New features have been added to make code management within your organization much easier.
Apex Code -- New classes, methods and interfaces have been added. Updates have been made to Chatter in Apex as well as new classes have been included in Winter ‘14.
This document discusses Einstein Analytics and data management. It provides an overview of Einstein Analytics architecture and data sync/connectors. It then demonstrates how to set up a dataflow to import CSV data, create a recipe, modify metadata, and register a dataset. Upcoming features discussed include joining datasets and suggested values in queries. The key benefits mentioned of managing data in the data layer rather than design layer are reducing dependencies on SAQL/JSON and adding common fields to datasets.
Moving to the cloud provides several benefits for Helix Linear Technologies' ERP system. It transitions the company from an on-premise system requiring specialized expertise to maintain, to a less complex, cloud-based system integrated with Salesforce. This reduces complexity, administration needs, and customization/integration challenges. Specifically, Helix is moving to Kenandy, a native Force.com application, to gain the standard customization and integration of the Salesforce platform. The cloud-based system will also improve processes like capacity planning, mobile shop floor management, labor costing, and quality assurance.
Part of what truly makes a platform is an ability to integrate with third party devices, servers and software. Join Ami Assayag and Kirk Steffke from CRM Science and Developer Evangelist Josh Birk as they breakdown examples of using Apex for integration solutions. Apex has robust methods for handling both inbound requests into Salesforce and outbound calls into third party systems. This webinar will break down how Apex can be used in these cases as well as how to test the code once it is up and running.
Key Takeaways
- How Apex fits into an integration solution
- Using Apex to create custom endpoints
- Handling outbound calls with Apex
- How to achieve test coverage with mock interfaces
Intended Audience
Developers with Apex experience looking to integrate with either existing API’s or expanding the functionality of Salesforce API’s.
"We'll need an Apex trigger to do that." Sound familiar? Take your advanced Admin skills to the next level by developing Apex triggers to solve complex business requirements that can't be implemented using just the configuration-driven features of Force.com. Join us to learn when and how to write your first Apex trigger, and some best practices for making them effective.
Spice up Your Internal Portal with Visualforce and Twitter BootstrapSalesforce Developers
Does your intranet or internal portal need updating? Join us to see how we transformed our Employee News page into a robust, mobile design using Visualforce and Twitter Bootstrap in a little over two months. Designed with a responsive approach, this methodology can additionally be used throughout Visualforce pages within other projects.
This document summarizes a presentation about building real-time apps with Node.js, Heroku, and the Force.com Streaming API. It includes a safe harbor statement and introduces the speakers. The presentation discusses the evolution of the web from static to dynamic and database-driven sites, and how new demands require real-time data. It describes using technologies like Node.js, web sockets via Socket.io, and the Force.com Streaming API to build real-time apps. CSS3 and 3D transformations are used for visual effects in the presentation. The speakers are then introduced.
The document discusses Platform as a Service (PaaS) selection criteria in 2017 with a focus on the latest information about Heroku. It provides an overview of Heroku's capabilities including processing over 80 billion items daily for over 500,000 developed apps. Examples are given of deploying a Node.js app to Heroku using Git. Key criteria for selecting a PaaS like Heroku are mentioned such as scalability, ease of deployment, and availability of add-ons. The conclusion encourages attending upcoming Heroku events for more information.
Want to get into Flows but need a jumpstart? Join us as we share Flow best practices we've learned the hard way. For instance, in cases where values have to be hardcoded, using constants instead, use of an "Error handling" Flow to capture errors, and use of subflows similar to suggestions on using Triggers (1 per object).
Talk given by Ravi Kishore Valeti, Software Engineering LMTS at Salesforce, at GIDS in April 2016
Most Enterprises have been thinking of (and some of them are already) running BDaaS and performing analytics over their Big Data to help make key business decisions. This talk is about "what it takes to operationalize BDaaS, challenges in successfully running large scale Big Data clusters".
Df14 Building Machine Learning Systems with Apexpbattisson
Slide deck from the Dreamforce 20134 talk "Building Machine Learning Systems with Apex". Includes links to github code repository and contact details for speakers.
Data Democracy: Use Lightning Connect & Heroku to Visualize any Data, AnywhereSalesforce Developers
Join us as we demonstrate an Open OData Adapter for Lightning Connect. No schema, no code, no build. Just hit "Heroku Deploy" and the endpoint will be live on Heroku with a custom model as an OData service which can then be consumed by Lightning Connect. This session provides a great introduction for IT organizations who want to develop OData services for their backend systems and accelerate Lightning Connect adoption.
This document provides an overview of Wave App Development by Skip Sauls of Salesforce. It discusses how Wave allows anyone to build analytics apps for various use cases like sales, service, marketing, and custom apps. The architecture of Wave leverages Force.com and its API can be used to build components. The roadmap discusses enhancing Wave with more data sources, advanced analytics, predictive capabilities, and tools to more easily build and share apps.
Finding relevant results faster with ElasticsearchElasticsearch
With today's growing amounts of information, it's critical to be able to efficiently retrieve the most relevant results for a query. Learn how Elasticsearch finds top hits for a query so quickly by building inverted indexes and using the Block-MAX WAND algorithm, as well as how you can tune Elasticsearch to make it even more efficient for your own use cases.
Doc is Dead! How Walkthroughs Changed Salesforce's Content StrategyGavin Austin
A case study on why the Documentation and User Assistance team at Salesforce changed its content strategy to include more forms of interactive content.
One of the core advantages of the Salesforce Analytics Cloud is the ability to process extremely large data sets without sacrificing performance. Building those data sets is not always easy. Vendor tools do a great job but can be expensive and, in some cases, organizations need complete control over their data transformations. In this session, we will explore the different native options available for loading data into the Analytics Cloud, and review the pros and cons of using Apex. We will demonstrate a simple data loader app built with the Analytics Cloud simple API.
In this talk you will learn:
How to structure your JS-heavy project in Salesforce DX
How to structure your JS-heavy project in Salesforce DX
Learn how to use all the familiar JS tools with Webpack and Lightning
Techniques to Effectively Monitor the Performance of Customers in the CloudSalesforce Engineering
This document discusses techniques for effectively monitoring customer performance in the cloud. It recommends establishing a baseline for normal performance and monitoring metrics and thresholds to detect deviations. Key metrics to track include counts, medians, percentiles, and distributions over time. Dashboards should visualize these metrics and allow comparing performance across different time periods. An example dashboard monitors adoption, errors, and metrics over the last 30 days and compares to the same day last week. The presentation demonstrates an Einstein Analytics dashboard for interactive analysis across devices.
HBase is a healthy, stable, and popular open source distributed database that is celebrating its 10th birthday. It has over 160 contributors and developers, with steady releases being made across multiple active versions. Improvements and the 2.0 release are upcoming, building on strong community involvement and contributions over its history.
This document summarizes Salesforce's use of HBase and Phoenix for storing and querying large volumes of structured and unstructured data at scale. Some key details:
1) Salesforce heavily uses HBase and Phoenix for both customer-facing and internal use cases, including storing login data, user activity, thread dumps, and more.
2) Salesforce operates over 100 HBase clusters of varying sizes to support over 4 billion write requests and 600 million read requests per day, totaling over 80 terabytes of data written and 500 gigabytes read daily.
3) An example use case is a central metrics database collecting data from over 80,000 machines, storing 11.4 trillion metrics and growing, with
The tech talk was given by Kexin Xie, Director of Data Science, and Yacov Salomon, VP of Data Science in June 2017.
Scaling up data science applications: How switching to Spark improved performance, realizability and reduced cost
Cem Gurkok presented on containers and security. The presentation covered threats to containers like container exploits and tampering of images. It discussed securing the container pipeline through steps like signing, authentication, and vulnerability scans. It also covered monitoring containers and networks, digital forensics techniques, hardening containers and hosts, and vulnerability management.
This document provides an overview of aspect-oriented programming (AOP) and various AOP implementations. It begins with an introduction to AOP concepts like cross-cutting concerns. It then discusses the AOP frameworks AspectJ and Spring AOP, covering their pointcut and advice anatomy. The document also examines how AOP can be used for code coverage, benchmarks, improved compilation, and application monitoring. It analyzes implementations like JaCoCo, JMH, HotswapAgent, and AppDynamics as examples.
This document discusses using XHProf to perform performance tuning of PHP applications. It begins with an introduction of the speaker and their company Pardot. It then provides an overview of XHProf including how to install, configure, and use it to profile PHP applications. The document outlines various performance tips for PHP such as optimizing array operations, managing memory efficiently, and improving database queries. It also walks through some examples of profiling a sample Symfony application that involves getting click data from a database. The examples demonstrate how to optimize queries and object hydration to improve performance.
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteSalesforce Engineering
This document summarizes a presentation about building a SQL interface for Apache Pig using Apache Calcite. It discusses using Calcite's query planning framework to translate SQL queries into Pig Latin scripts for execution on HDFS. The presenters describe their work at Salesforce using Calcite for batch querying across data sources, and outline their process for creating a Pig adapter for Calcite, including implementing Pig-specific operators and rules for translation. Lessons learned include that Calcite provides flexibility but documentation could be improved, and examples from other adapters were helpful for their implementation.
The document discusses implementing a content strategy and outlines some key lessons learned. It notes that implementing a content strategy is like running a long distance and will involve pain, relationships, and focusing on strengths over weaknesses. It advises getting ready for the pain involved, not trying to do it alone, and leveraging strengths rather than weaknesses. The presentation encourages the audience to take action by volunteering or taking the next step.
The tech talk was given by Jim Walsh, Salesforce SVP Infrastructure Engineering in May 2017.
The presentation provides a brief overview of Salesforce Cloud Infrastructure and Challenges.
Koober is an open-source interactive website that uses machine learning models trained on historical taxi and weather data to visualize past taxi demand and predict future demand. It generates datasets by clustering taxi pickup locations and extracting features from the data, then builds models using techniques like gradient-boosted trees and neural networks. The website integrates these predictions with interactive maps to help the taxi industry optimize operations and better meet customer needs based on past trends.
Talk given by Marat Vyshegorodtsev and Sergey Gorbaty. Enterprise Security team at Salesforce, in January 2017.
Discusses a set of open source tools that analyze the Apex/VisualForce code and advise on its quality.
This document discusses microservices and the process of setting up a new microservice. It covers topics such as defining the service scope, getting approvals, source control and packaging, running environments, logging and monitoring, and preparing the service for production use. The key aspects of setting up a new microservice include buy-in from management, external design reviews, source control and deployment automation, provisioning compute and storage resources, and integrating the service with monitoring and on-call systems.
This document discusses using Apache Zookeeper to orchestrate microservice deployments. It describes how Zookeeper can be used to define service topology, enable one-button deployments through a coordinator service called Maestro, and ensure high availability and failure recovery. The Maestro coordinator initiates and manages deployments by monitoring global state in Zookeeper and determining which nodes to deploy next. Maestro agents on each node receive notifications, create execution plans to deploy updates, and publish status to Zookeeper. Different propagation strategies like canary deployments and rollback capabilities provide health mediation during deployments.
Talk given by Gavin Austin, Principal Technical Writer, and Ted Kuster, Lead Technical Writer, at STC Silicon Valley meetup on February 2016
Customers no longer have the patience to read online help or user guides. To help customers better understand why they should use a variety of features, and renew their subscription-based apps, Salesforce conducted research to determine the content types that engaged customers most. The result—Salesforce changed its content strategy.
In this session, you’ll learn:
What types of interactive content we’re creating at Salesforce
Why Salesforce moved to interactive content over documentation
How a large company changed its content strategy and how customers responded
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including:
-Artificial Intelligence Market Overview
-Strategies for AI Adoption in 2025
-Anticipated drivers of AI adoption and transformative technologies
-Benefits of AI and Big data for your business
-Tips on how to prepare your business for innovation
-AI and data privacy: Strategies for securing data privacy in AI models, etc.
Download your free copy nowand implement the key findings to improve your business.
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Mobile App Development Company in Saudi ArabiaSteve Jonas
EmizenTech is a globally recognized software development company, proudly serving businesses since 2013. With over 11+ years of industry experience and a team of 200+ skilled professionals, we have successfully delivered 1200+ projects across various sectors. As a leading Mobile App Development Company In Saudi Arabia we offer end-to-end solutions for iOS, Android, and cross-platform applications. Our apps are known for their user-friendly interfaces, scalability, high performance, and strong security features. We tailor each mobile application to meet the unique needs of different industries, ensuring a seamless user experience. EmizenTech is committed to turning your vision into a powerful digital product that drives growth, innovation, and long-term success in the competitive mobile landscape of Saudi Arabia.
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Procurement Insights Cost To Value Guide.pptxJon Hansen
Procurement Insights integrated Historic Procurement Industry Archives, serves as a powerful complement — not a competitor — to other procurement industry firms. It fills critical gaps in depth, agility, and contextual insight that most traditional analyst and association models overlook.
Learn more about this value- driven proprietary service offering here.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
2. Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be
deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other
financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any
litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our
relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our
service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to
larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is
included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent
fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor
Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions
based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these
forward-looking statements.
Safe Harbor
5. A. Why HBase?
B. Interacting with the open source community
C. HBase at Salesforce
6. Size Matters*
New Salesforce customer:
•“How many rows do you have?”
•We will turn folks away if they have too many!
Data Storage is expensive:
•SAN storage
•Relational Database
•Too many rows Too expensive
* In a relational world
7. What if in the future we:
… and have cheaper storage?
… and never need to ask again
about the number of rows?
… grow with the data by just
adding more machines?
(Disclaimer: no transactions, no joins, no 2nd’ary indexes, …)
8. (A quick note about) Relational Databases
• We love them. They are core to our infrastructure.
• SQL and NoSQL NoACID are complementary.
• (Almost) everything we do is SQL based (see Phoenix – the SQL layer for HBase.)
9. The Search - Requirements
• Consistent
– “Eventually consistent stores are 100% consistent 99% of the time” – Ian Varley
• Scalable
– No “features” impeding horizontal scaling
• Persistent
– Duh...?
• Key lookups
• Range lookups
• Open source (ASL great, GPLv2 OK, GPLv3/AGPL not acceptable)
12. To Fork or not to Fork – that is the question
Fork - pros
• Agility. No waiting for community review. Just get stuff done
• Freedom. Patches that might not be acceptable to the community
Fork - cons
• Lose out on community work
• Patches not useful to other parties
There is no right or wrong. It’s a matter of choice, taste, and requirements.
13. HBase Development @ Salesforce
• No fork of HBase.
• No fork of HBase.
• Internal HBase/HDFS branch for possible emergency fixes
• All fixes are cleaned and contributed back
• We switch to the next open source point release periodically
24. Salesforce is a Database
Query Parser
Query (SQL)
Parsed Query
Query Optimizer
Plan
Generator
Plan Cost
Estimator
Evaluation Plan
Query Plan Evaluator
System
Catalog
Database
Stats
Tables
Columns
Indexes
25. Salesforce is a Database
Query Parser
Query (SOQL)
Parsed Query
Query Optimizer
Plan
Generator
Plan Cost
Estimator
System
Catalog
Oracle
Hinted Oracle SQL
Database
Stats
Objects
Fields
Indexes
28. pod = a database instance
•Oracle RAC
•AppServers
•Blob store servers
•Search servers
•Shared SAN storage
•SAN replication for DR
App
Server
App
Server
App
Server
App
Server
…
Oracle
Node
Oracle
Node
Oracle
Node
Oracle
Node…
Oracle RAC cluster
Primary Site
Secondary Site
SAN replication
SAN
SAN
SQL/JDBC
30. Oracle
Hinted Oracle SQL
Query Parser
Query (SOQL)
Parsed Query
Query Optimizer
Plan
Generator
Plan Cost
Estimator
System
Catalog
Database
Stats
Objects
Fields
Indexes
1. External Objects 2. Phoenix SQL
HBaseHBaseHBaseHBase
Where does HBase Fit?
31. Where does HBase Fit?
•Separate HBase per pod (close to 50 clusters)
•Logically co-located with Oracle
•Small clusters striped across five racks
•Each cluster’s master service on a different rack
•Identical cluster for DR
App
Server
App
Server
App
Server
App
Server
…
Oracle
Node
Oracle
Node
HBase
Node
HBase
Node…
Oracle Cluster
HBase
Node
HBase
Node
HBase
Node …
Primary Site
Secondary Site
DR HBase Cluster
Decentralized
HBase
Replication
SQL/JDBC
via Phoenix
HBase Cluster
…
SAN
SAN
33. 1. Audit Trails (Entity History)
• Identity managed in RDBMS
• Indexed in HBase (Phoenix indexes)
• Historical, immutable data only
• No need to reason about updates, split identities, and transactions
34. 2. Archiving (Data Lifecycle Management)
• Objects (rows) moved to HBase
• Identity managed in HBase after move
• Data immutable in HBase
• No Transactions
35. 3. Live data in HBase (BigObjects)
• Mutable data (possibly)
• Everything managed in HBase
• Still no Transactions, yet
• Platform for other team to use
36. Merrill Lynch Rationalization Data Governance, Audit & Archive
• First Salesforce Enterprise Customer
• On PlatformArchival compelling versus On Premise
Solution from Informatica
• Retention Requirements for 7 Years
Merrill Lynch
“Data Audit, Governance & Lifecycle management is
critical for Merrill for the entire banking & financial
industry has become a benchmark requirement
37. Heating, ventilation, and air-conditioning in the EU
• Top 10 Platform Users
• Subject to highly variable data governance and
retention requirements
• Significant SAP footprint driving business rules –
need to connect that to Salesforce data for archival
and data retention needs
• Massive service workforce generates significant data
processing challenges
“The Salesforce.com Platform roadmap for Data Archive is
critical for future data management needs”
MichaelRoehr, CTO Vailliant
38. BMW Enriches Their Customer Perspective
• Sales Cloud available across all German Dealership
Franchises
• All customer data subject stringent & government
mandated protection, audit & retention
• Correlations with Car Builder App data enables more
contextual customer interactions
• Car Telemetry, used correctly help refine product
evolution and customer needs alignment
“Data driven customer engagement is a
key driver for our enhance customer
experience
41. Highly Available, Disaster Recovery
• Five peer Zookeeper Quorum
• Five Quorum Journals (for fs edits)
• Five HMasters
• Three NameNodes (yes, three, we made a patch to run more than one standby)
• HBase Replication to identical hot standby pod in a different data center
– In the event of a disaster we fail a complete pod to the secondary site
• Weekly automated, unattended rolling restarts
43. Monitoring & Management (M&M)
• Nagios alerts
• Trending via OpenTSDB.
Custom UI on top the time series data.
• Rolling upgrades
– Eventually scheduled and unattended
• Absolutely no unscheduled downtime.
Not even during a rack failure.
44. A. Why HBase?
B. Interacting with the open source community
C. HBase at Salesforce
#2: Spent time with StumbleUpon, Facebook, many others. This is a great community.
#3: Salesforce is seeing increasing change of center of gravity of customer data.Driving this forward across verticals such as Banking & Finserv requires data audit driven by post 2008 regularity requirements and Sar-Box requirements. As this data generated in a transactional environment we use HBase as our historical and immutable storage.
#4: Their use of the Salesforce.com platform to drive their entire business keeps to keep their dynamic and highly work force mobile in touch with their data.Given their operating environment in Germany they are required to deliver complete data audit and use Field History for this. They also are required to keep all customer data for at least 15 years which is why Archive is so key for them.
#5: Across Germany we've had a successful deployment in each franchise to establish new base lines in customer interactions with BMW customers, leases and service interactions. Looking beyond this usecase the capability of marrying together the customer data generated for the BMW Car Builder application and cleansed and anonymizedtelemetrics data is pushing Salesforce to deliver the concepts and tools to allow BMW to absorb the full spectrum of their customer event data stream, and take business actions on it.Imagine how I would feel as a prospective customer if I walked into a dealership and they have a more informed knowledge of who I am and my likely preferences. We are using the notion of BigObjects to absorb, store and act on the data that is behind the Internet of Customers.