Starting from the persistence needs of an API PaaS, we'll explain how we selected Cassandra and, finally, DSE Search, the main challenges we faced both in term of development and operations, and the solutions we have implemented.
NATS in action - A Real time Microservices Architecture handled by NATSRaül Pérez
The document describes an architecture for managing infrastructure and platforms using microservices that communicate over NATS. Key points:
- Ernest is an IAAS+PAAS hybrid cloud platform that uses microservices and NATS to manage infrastructure resources, deploy applications, and automate scaling across multiple cloud providers.
- NATS is used as the central communication system between Ernest microservices to process user-defined workflows for building environments.
- Workflows define things like networks, virtual machine instances, configuration, and can deploy and provision applications. This allows Ernest to automate the creation and management of environments.
What's New and Next in OpenNTF Domino API (ICON UK 2014)Paul Withers
- The document summarizes the presentation "What's New And Next in OpenNTF Domino API" given by Paul Withers.
- It describes recent enhancements to the OpenNTF Domino API including improvements to logging, database methods, document serialization, and email functionality.
- Future plans include expanding the XOTS task framework, graph database support, classes to represent all design elements, and potential integration with administrative functions.
OpenStack is a proven open source software for creating private and public clouds. It is being used by a very large ecosystem of companies who use it every day to run their businesses.This talk will be an introduction to Openstack and it will cover the following:
- What is OpenStack
- Who is involved and who uses it
- Projects under the OpenStack umbrella
- OpenStack architecture(s)
- OpenStack releases
- How to contribute to OpenStack
- Q & A
This meeting we'll host a discussion on Google Cloud Platform and Amazon Web Services to bring light to similarities and differences between platforms. If you have questions about how our platforms compare this is the meeting to attend!
The document summarizes an update on the OpenDJ project from Ludovic Poitou of ForgeRock. OpenDJ is an open source LDAPv3 directory server project initiated by Sun in 2006 and now led by ForgeRock. Version 2.6.0 of OpenDJ includes REST to LDAP integration, pass-through authentication to Active Directory, packaging improvements, and LDAP client API enhancements. Planned future enhancements include refactoring the server code, adding proxy functionality, improving scalability and multi-tenancy, and strengthening password policy support.
This document discusses OpenDaylight's implementation of OpenFlow clustering. Key points:
- OpenDaylight uses components like the OpenFlow plugin, Entity Ownership Service, and clustered datastore to implement clustering.
- Clustering provides high availability (via controller failover) and scalability. OpenFlow 1.3+ supports master/slave roles to facilitate failover.
- The Entity Ownership Service elects a master controller for each switch using strategies like first candidate or least loaded. It notifies of ownership changes.
- Challenges include race conditions, switch connection flapping, partitioning, and scale. Areas of future work are listed.
The document summarizes upcoming changes and improvements to Cincom Smalltalk products, including the Foundation, ObjectStudio, and VisualWorks. Key changes include modernizing the text editor, source code editor, and user interface with new "millennial" versions. The Foundation will see improvements to tools like SiouX and AppeX, as well as updated PostgreSQL drivers. ObjectStudio is focusing on next generation user interface integration. VisualWorks will improve skinning and layout functionality. Future plans include additional framework updates across products.
This document provides an overview of recent developments with the OpenLDAP project. It discusses the adoption of the Lightning Memory-Mapped Database (LMDB) which has improved performance and efficiency both within OpenLDAP and for other projects. It also outlines new work on the HyperDex clustered backend and Samba4/Active Directory integration. While performance gains have been made, more work remains, including deprecating the old BerkeleyDB backends and improving transaction support.
Presented at the MySQL Chicago Meetup in August 2016. The focus of the talk is on backups and verification, replication and failover, as well as security and encryption.
Presentation from a talk given by Diogo Monteiro (@diogogmt) at a recent NATS Meetup in Toronto. The talk covered why NATS is a simple, fast method for microservices communication, and provides some latency benchmarks from Diogo's design of a solution using NATS.
You can learn more about NATS at http://www.nats.io
Architectural caching patterns for kubernetesRafał Leszko
The document discusses various architectural caching patterns for Kubernetes, including embedded, embedded distributed, client-server, cloud, sidecar, reverse proxy, and reverse proxy sidecar caching. It provides examples of implementing each pattern using Hazelcast and discusses the pros and cons of each approach.
The document provides an overview of Apache ManifoldCF, an open source content management system. It describes ManifoldCF's capabilities, including crawling repositories to index their contents and push those contents to search servers. It details the key components of ManifoldCF like the Pull Agent Daemon, jobs, connectors, and monitoring UI. The document also outlines ManifoldCF's history and major releases from its incubation at Apache to becoming a top-level project.
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScaleColin Charles
This document discusses MySQL proxy technologies including MySQL Router, ProxySQL, and MariaDB MaxScale. It provides an overview of each technology, including when they were released, key features, and comparisons between them. ProxySQL is highlighted as a popular option currently with integration with Percona tools, while MySQL Router may become more widely used due to its support for MySQL InnoDB Cluster. MariaDB MaxScale is noted for its binlog routing capabilities. Overall the document aims to help people understand and choose between the different MySQL proxy options.
This document introduces Minoru Osuka and provides information about ManifoldCF and Solr. It discusses that Minoru is a committer and PMC member of ManifoldCF at Apache Software Foundation and a senior consultant. It then provides an overview of what ManifoldCF is, its project status, architecture, use cases, resources, books, and demonstration. It concludes by announcing that Minoru's company is now hiring.
Approaching a platform transition requires proper planning and execution on top of a serious technical architecture. In this webinar, we’ll discuss the migration of NYSenate.gov from three unique perspectives: the lead developer, the project manager and the platform provider.
Join Brad MacDonald and Derek Reese of Mediacurrent and Erik Mathy of Pantheon for the in-depth technical notes of this project and a discussion on the future of democratic development.
The document discusses OSGeo's incubation process for open source geospatial software projects. It provides updates on projects that have graduated from incubation, are currently in incubation, and projects that are pending graduation or not communicating. It asks how readers can help provide support to incubation projects that are stuck or lacking communication.
Lukáš Malý - Log management ELISA controlled by Zabbix | ZabConf2016Zabbix
Datasys ELISA log management is robust, powerful, yet inexpensive solution for collection, correlation and analysis of logs. Core system consists of the Elasticsearch “noSQL“ database and the web user interface Kibana, which provides high comfort for analysis of detected security incidents and relevant logs. It is common that the database ElasticSearch is distributed to multiple servers to achieve load balancing and high availability of indexed data. ELISA heavily utilizes ZABBIX for user authentication and role based access control, notifications and self-monitoring. Elasticsearch Indices can be managed right in ZABBIX Frontend. ZABBIX "trapper" items and monitoring templates are used to centrally manage configuration of distributed environment of NXlog agents. Agents are capable to securely auto-register as ZABBIX "hosts".
Architectural caching patterns for kubernetesRafał Leszko
The document summarizes several common architectural caching patterns for Kubernetes:
1. Embedded caching stores cache data directly within application processes or containers.
2. Client-server caching moves the cache to a separate server or service, accessed by applications via the network.
3. Sidecar caching co-locates an independent caching container alongside application containers to provide low-latency access while separating concerns.
Introducing the Microservices Reference Architecture Version 1.2NGINX, Inc.
About the webinar
Application development using microservices is changing very quickly, even as many organizations are gearing up to produce their first full-fledged microservices apps, or expand microservices development. Among these changes are the emergence of Kubernetes as the most widely -used approach to container management and the arrival of service mesh architectures. The Istio service mesh architecture has reached version 1.0.
There is also an increasing recognition of the need for security in service-to-service communications. In the upcoming Version 1.2 of the Microservices Reference Architecture, NGINX will offer an update to its robust and flexible array of models for microservices development, giving developers much more choice and the opportunity to “right-size” the microservices model they choose to the task at hand, while preserving the opportunity for future growth.
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
TubeMogul grew from few servers to over two thousands servers and handling over one trillion http requests a month, processed in less than 50ms each. To keep up with the fast growth, the SRE team had to implement an efficient Continuous Delivery infrastructure that allowed to do over 10,000 puppet deployment and 8,500 application deployment in 2014. In this presentation, we will cover the nuts and bolts of the TubeMogul operations engineering team and how they overcome challenges.
Dynamic routing in microservice oriented architectureDaniel Leon
When splitting an application into different micro-services and each application access URL is dynamically generated, the hell gets loose. If you are tired of manually setting a route in your ngnix, come see linkerd in action.
As organisations store more and more information in their Alfresco content hubs, search and discovery of content becomes important. Alfresco comes bundled with Apache Lucene and Apache Solr for search. Although these provide full text capabilities, they do not have the scalability and functionality of the newer cloud scalable search software such as Apache Solr Cloud 4, Elastic Search and Amazon Cloud Search. Also, searching across multiple Alfresco instances including Alfresco Cloud is quite a challenge and any of the possible approaches are not good enough to be production ready.
This talk shows you how to index and search content stored in one or more Alfresco repositories, other CMIS repositories or file systems using either Apache Solr Cloud 4, Elastic Search or Amazon Cloud Search, while still ensuring the confidentiality of the documents based on the permissions configured in Alfresco or any other repositories.
Leonid Vasilyev "Building, deploying and running production code at Dropbox"IT Event
Reproducible builds, fast and safe deployment process together with self-healing services form the basis of stable and maintainable infrastructure. In this talk I’d like to cover, from the Site Reliability Engineering (SRE) perspective, how Dropbox addresses above challenges, what technologies are used and what lessons were learnt during implementation process.
[POSS 2019] OVirt and Ceph: Perfect Combination.?Worteks
This document discusses using the open-source virtualization platform Ovirt together with the distributed storage platform Ceph. It evaluates three options for combining Ovirt and Ceph that provide high availability and scalability without single points of failure. The recommended option uses CephFS to provide POSIX-compliant storage for virtual machines hosted on an Ovirt cluster, along with GlusterFS for the Ovirt hosted engine and Ceph monitor nodes. This allows up to seven hosts without any single points of failure.
This presentation deals with logging in the course of mobile development, namely describing the open source logging environment built with ELK stack (ElasticSearch, Logstash and Kibana).
Presentation by Igor Rudyk (Software Engineer, GlobalLogic, Lviv), delivered at Mobile TechTalk Lviv on April 28, 2015.
More details - http://globallogic.com.ua/mobile-techtalk-lviv-2015-report
Assume you have a Cassandra cluster with hundreds of tables, and one day the latency of client requests and CPU utilization of the Cassandra process became unacceptable. Our team regularly faced such a problem. A simple look at the metrics did not help us - because of high CPU utilization, we saw bad metrics across almost all of our tables. In this talk I will discuss ways to find out which table (or tables) are the most problematic for the cluster and create problems for other tables. The talk will be fully practical as I will introduce our real steps in this investigation. Some of the steps were successful, others were not. But finally we reduced both latency and CPU utilization in about 10 times without adding additional nodes and hardware resources.
This document discusses real time analytics using Spark and Spark Streaming. It provides an introduction to Spark and highlights limitations of Hadoop for real-time analytics. It then describes Spark's advantages like in-memory processing and rich APIs. The document discusses Spark Streaming and the Spark Cassandra Connector. It also introduces DataStax Enterprise which integrates Spark, Cassandra and Solr to allow real-time analytics without separate clusters. Examples of streaming use cases and demos are provided.
Presented at the MySQL Chicago Meetup in August 2016. The focus of the talk is on backups and verification, replication and failover, as well as security and encryption.
Presentation from a talk given by Diogo Monteiro (@diogogmt) at a recent NATS Meetup in Toronto. The talk covered why NATS is a simple, fast method for microservices communication, and provides some latency benchmarks from Diogo's design of a solution using NATS.
You can learn more about NATS at http://www.nats.io
Architectural caching patterns for kubernetesRafał Leszko
The document discusses various architectural caching patterns for Kubernetes, including embedded, embedded distributed, client-server, cloud, sidecar, reverse proxy, and reverse proxy sidecar caching. It provides examples of implementing each pattern using Hazelcast and discusses the pros and cons of each approach.
The document provides an overview of Apache ManifoldCF, an open source content management system. It describes ManifoldCF's capabilities, including crawling repositories to index their contents and push those contents to search servers. It details the key components of ManifoldCF like the Pull Agent Daemon, jobs, connectors, and monitoring UI. The document also outlines ManifoldCF's history and major releases from its incubation at Apache to becoming a top-level project.
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScaleColin Charles
This document discusses MySQL proxy technologies including MySQL Router, ProxySQL, and MariaDB MaxScale. It provides an overview of each technology, including when they were released, key features, and comparisons between them. ProxySQL is highlighted as a popular option currently with integration with Percona tools, while MySQL Router may become more widely used due to its support for MySQL InnoDB Cluster. MariaDB MaxScale is noted for its binlog routing capabilities. Overall the document aims to help people understand and choose between the different MySQL proxy options.
This document introduces Minoru Osuka and provides information about ManifoldCF and Solr. It discusses that Minoru is a committer and PMC member of ManifoldCF at Apache Software Foundation and a senior consultant. It then provides an overview of what ManifoldCF is, its project status, architecture, use cases, resources, books, and demonstration. It concludes by announcing that Minoru's company is now hiring.
Approaching a platform transition requires proper planning and execution on top of a serious technical architecture. In this webinar, we’ll discuss the migration of NYSenate.gov from three unique perspectives: the lead developer, the project manager and the platform provider.
Join Brad MacDonald and Derek Reese of Mediacurrent and Erik Mathy of Pantheon for the in-depth technical notes of this project and a discussion on the future of democratic development.
The document discusses OSGeo's incubation process for open source geospatial software projects. It provides updates on projects that have graduated from incubation, are currently in incubation, and projects that are pending graduation or not communicating. It asks how readers can help provide support to incubation projects that are stuck or lacking communication.
Lukáš Malý - Log management ELISA controlled by Zabbix | ZabConf2016Zabbix
Datasys ELISA log management is robust, powerful, yet inexpensive solution for collection, correlation and analysis of logs. Core system consists of the Elasticsearch “noSQL“ database and the web user interface Kibana, which provides high comfort for analysis of detected security incidents and relevant logs. It is common that the database ElasticSearch is distributed to multiple servers to achieve load balancing and high availability of indexed data. ELISA heavily utilizes ZABBIX for user authentication and role based access control, notifications and self-monitoring. Elasticsearch Indices can be managed right in ZABBIX Frontend. ZABBIX "trapper" items and monitoring templates are used to centrally manage configuration of distributed environment of NXlog agents. Agents are capable to securely auto-register as ZABBIX "hosts".
Architectural caching patterns for kubernetesRafał Leszko
The document summarizes several common architectural caching patterns for Kubernetes:
1. Embedded caching stores cache data directly within application processes or containers.
2. Client-server caching moves the cache to a separate server or service, accessed by applications via the network.
3. Sidecar caching co-locates an independent caching container alongside application containers to provide low-latency access while separating concerns.
Introducing the Microservices Reference Architecture Version 1.2NGINX, Inc.
About the webinar
Application development using microservices is changing very quickly, even as many organizations are gearing up to produce their first full-fledged microservices apps, or expand microservices development. Among these changes are the emergence of Kubernetes as the most widely -used approach to container management and the arrival of service mesh architectures. The Istio service mesh architecture has reached version 1.0.
There is also an increasing recognition of the need for security in service-to-service communications. In the upcoming Version 1.2 of the Microservices Reference Architecture, NGINX will offer an update to its robust and flexible array of models for microservices development, giving developers much more choice and the opportunity to “right-size” the microservices model they choose to the task at hand, while preserving the opportunity for future growth.
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
TubeMogul grew from few servers to over two thousands servers and handling over one trillion http requests a month, processed in less than 50ms each. To keep up with the fast growth, the SRE team had to implement an efficient Continuous Delivery infrastructure that allowed to do over 10,000 puppet deployment and 8,500 application deployment in 2014. In this presentation, we will cover the nuts and bolts of the TubeMogul operations engineering team and how they overcome challenges.
Dynamic routing in microservice oriented architectureDaniel Leon
When splitting an application into different micro-services and each application access URL is dynamically generated, the hell gets loose. If you are tired of manually setting a route in your ngnix, come see linkerd in action.
As organisations store more and more information in their Alfresco content hubs, search and discovery of content becomes important. Alfresco comes bundled with Apache Lucene and Apache Solr for search. Although these provide full text capabilities, they do not have the scalability and functionality of the newer cloud scalable search software such as Apache Solr Cloud 4, Elastic Search and Amazon Cloud Search. Also, searching across multiple Alfresco instances including Alfresco Cloud is quite a challenge and any of the possible approaches are not good enough to be production ready.
This talk shows you how to index and search content stored in one or more Alfresco repositories, other CMIS repositories or file systems using either Apache Solr Cloud 4, Elastic Search or Amazon Cloud Search, while still ensuring the confidentiality of the documents based on the permissions configured in Alfresco or any other repositories.
Leonid Vasilyev "Building, deploying and running production code at Dropbox"IT Event
Reproducible builds, fast and safe deployment process together with self-healing services form the basis of stable and maintainable infrastructure. In this talk I’d like to cover, from the Site Reliability Engineering (SRE) perspective, how Dropbox addresses above challenges, what technologies are used and what lessons were learnt during implementation process.
[POSS 2019] OVirt and Ceph: Perfect Combination.?Worteks
This document discusses using the open-source virtualization platform Ovirt together with the distributed storage platform Ceph. It evaluates three options for combining Ovirt and Ceph that provide high availability and scalability without single points of failure. The recommended option uses CephFS to provide POSIX-compliant storage for virtual machines hosted on an Ovirt cluster, along with GlusterFS for the Ovirt hosted engine and Ceph monitor nodes. This allows up to seven hosts without any single points of failure.
This presentation deals with logging in the course of mobile development, namely describing the open source logging environment built with ELK stack (ElasticSearch, Logstash and Kibana).
Presentation by Igor Rudyk (Software Engineer, GlobalLogic, Lviv), delivered at Mobile TechTalk Lviv on April 28, 2015.
More details - http://globallogic.com.ua/mobile-techtalk-lviv-2015-report
Assume you have a Cassandra cluster with hundreds of tables, and one day the latency of client requests and CPU utilization of the Cassandra process became unacceptable. Our team regularly faced such a problem. A simple look at the metrics did not help us - because of high CPU utilization, we saw bad metrics across almost all of our tables. In this talk I will discuss ways to find out which table (or tables) are the most problematic for the cluster and create problems for other tables. The talk will be fully practical as I will introduce our real steps in this investigation. Some of the steps were successful, others were not. But finally we reduced both latency and CPU utilization in about 10 times without adding additional nodes and hardware resources.
This document discusses real time analytics using Spark and Spark Streaming. It provides an introduction to Spark and highlights limitations of Hadoop for real-time analytics. It then describes Spark's advantages like in-memory processing and rich APIs. The document discusses Spark Streaming and the Spark Cassandra Connector. It also introduces DataStax Enterprise which integrates Spark, Cassandra and Solr to allow real-time analytics without separate clusters. Examples of streaming use cases and demos are provided.
MuleSoft Connect 2016 - Getting started with RAML using Restlet’s visual desi...Restlet
In this presentation by Jerome Louvel, Restlet's Founder and Chief Geek, discover the Restlet Studio and get a glimpse of the Restlet platform's capabilities. Learn about API project styles and collaborative API-first design.
MyDrive Solutions: Case Study: Troubleshooting Production Issues as a Developer.DataStax Academy
This talk will be a step by step walkthrough of a developer troubleshooting a real performance issue we had at MyDrive, from the very first steps diagnosing the symptoms, through looking at metric charts down to CQL queries, the Ruby CQL driver, and Ruby code profiling.
AdStage: Monacella: An Relational Object Database using Cassandra as the Data...DataStax Academy
At AdStage we have a large volume of data about ads and their relationships: campaigns, ad groups, keywords, bids, budgets, targeting info - the list goes on. We started out storing all this data in Postgres, but even before we reached public beta we were already putting too much strain on the largest Postgres instance we could run. We considered sharding, but instead we decided to embark on a project to store our data in Cassandra. Now, after more than a year of development, we present Monacella, a relational object database that uses Cassandra as the datastore. In this talk we'll examine the architecture of Monacella, its features and use cases, and plans for future development.
Capital One: Using Cassandra In Building A Reporting PlatformDataStax Academy
As a leader in the financial industry, Capital One applications generate huge amounts of data that require fast and accurate handling, storage and analysis. We are transforming how we report operational data to our internal users so that they can make quick and precise business decisions to serve our customers. As part of this transformation, we are building a new Go-based data processing framework that will enable us to transfer data from multiple data stores (RDBMS, files, etc.) to a single NoSQL database - Cassandra. This new NoSQL store will act as a reporting database that will receive data on a near real-time basis and serve the data through scorecards and reports. We would like to share our experience in defining this fast data platform and the methodologies used to model financial data in Cassandra.
DataStax: Testing Cassandra Guarantees Under Diverse Failure Modes With JepsenDataStax Academy
The increasing prevalence of large-scale distributed systems necessitates careful testing and understanding of the invariants and guarantees at play. In particular, Kyle Kingsbury's "Call Me Maybe" series has increased awareness of this need for developers and administrators alike. In this talk, Joel will discuss these issues in the context of his efforts as an intern at DataStax to develop extensive testing coverage via Kingsbury's Jepsen library.
DataStax: The Cassandra Validation Harness: Achieving More Stable ReleasesDataStax Academy
As of 3.0, Apache Cassandra will be moving to a tick-tock release cycle to improve the stability problems that have plagued new major releases. Alongside that, we need to expand beyond our existing test infrastructure, and begin simulating production environments. We'll explore the Cassandra Validation Harness, a new testing infrastructure for improving our detection rate of resource leaks, concurrency problems, and other bugs that only show up at scale.
The Last Pickle: Repeatable, Scalable, Reliable, Observable: CassandraDataStax Academy
Apache Cassandra makes it possible to write code on a laptop and deploy to multi-region clusters with a few configuration changes. But what does it take to create repeatable, scalable, reliable, and observable clusters?
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the tools and techniques they use. From environment planning to implementation for tools such as Chef, Sensu, Graphite, Riemann and LogStash this will be a discussion of the full stack ecosystem for successful projects.
Silicon Valley Data Science: From Oracle to Cassandra with SparkDataStax Academy
The document describes Allant Group's process of migrating their customer recognition application from an Oracle database with JMS messaging to Cassandra with Spark. They introduced Cassandra to replace the Oracle database, but needed to re-architect the application layer to improve scalability and throughput. Allant employed Hadoop and then Spark with the Datastax Cassandra connector to distribute the application logic. Benchmark testing showed throughput increased from 2.5 million to over 125 million records per hour, reducing processing time from 6-7 days to just 3 hours. The migration enabled elastic scaling, higher data throughput, and lower costs while maintaining the existing application logic.
AddThis: Scaling Cassandra up and down into containers with ZFSDataStax Academy
ZFS is an advanced file, raid, and volume management system originally developed by Sun Microsystems, 'The Last Word in File Systems' has been unavailable on Linux until recently. AddThis uses ZFS to more effectively scale up dedicated hardware, getting twice the performance at half the cost. ZFS is also fundamental to containerization, allowing nodes from multiple clusters to be co-located with safe persistent storage.
Cassandra Summit 2014: META — An Efficient Distributed Data Hub with Batch an...DataStax Academy
Stratio Crossdata is a distributed data platform that allows for both batch and streaming queries across multiple data stores. It uses Spark to enable operations not natively supported and provides connectors to integrate different data sources. The platform aims to simplify deployment, administration and querying for clients through its metadata management and support for features like full text search, joins and streaming queries.
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...DataStax Academy
Presenter: Harold Nguyen, Senior Data Scientist at Nexgate
In this talk, we focus on a use case by showing how Cassandra can detect spam and spammers on social media. We also show how we use Cassandra to train our 100+ social-media-security classifiers. The accuracy of any security product is directly tied to the breadth of the corpus of data upon which it is built. For Nexgate, this means that the success of our products is inextricably tied to our ability to save everything we've ever scanned, but in a way that is still readily accessible. In the days before NoSQL, this was hard. This talk is about how Datastax and Cassandra make it easy.
This document proposes an email app that models conversations and topics as the core data structures. It outlines two data models - one where conversations are grouped by a hash of recipients and topics represent conversation threads, and another where conversations are grouped by a hash of recipients and emails are attached directly to conversations. The data models are designed for an app built using Scala, C*, AWS, and Spark.
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...DataStax Academy
Presenter: Eiti Kimura, Senior Software Engineer at Movile
Apache Cassandra was adopted by Movile in 2009, and became a fundamental piece within the robust and scalable architecture to support more than 50 products, impacted by over 200MM users in Latin America. In this case we present the architecture of our ring, configuration details, detailed tuning, hardware used to be able to achieve our performance requirements (order of a few milliseconds), information storage strategies for network and disk space optimization, and best practices, in addition to showing the evolution of the architecture of simple systems to become scalable and distributed platforms. We introduced our cluster with a relatively low number of nodes (6) using commodity hardware to support critical high-performance applications. After this talk, you'll understand how Apache Cassandra was essential to evolve our systems and leverage the growth of our business. Movile is the leading mobile content company in Latin America. Movile’s products include mobile content, mobile TV, mobile learning, mobile games, mobile payment, mobile marketing and mobile commerce. Every month, it publishes content and services to more than 20 million mobile costumers. It has grown substantially over the last few years (with a more than 25-fold increase in its revenue over the last five years) both organically and through an aggressive M&A strategy, including five acquisitions in the last five years. Movile is positioning itself as a kind of Silicon Valley company based in Brazil. For the last two years, Movile has been named in the “Great Place to Work” list for technology companies in Brazil. The company shareholders include the founders of the company plus Naspers, a South-African media conglomerate.
Cassandra Summit 2014: Cassandra in Large Scale Enterprise Grade xPatterns De...DataStax Academy
Presenter: Claudiu Barbura, Senior Director of Engineering at Atigeo
xPatterns is a big data analytics platform-as-a-service that enables rapid development of enterprise-grade analytical applications. It provides tools, API sets and a management console for building an ELT pipeline with data monitoring and quality gates, a data warehouse for ad-hoc and scheduled querying, analysis, model building and experimentation, tools for exporting data to Cassandra and solrCloud clusters for real-time access through low-latency/high-throughput (automatically generated) apis as well as dashboard and visualization api/tools leveraging the available data and models. In this talk I'll share some of the hard lessons we've learned in the past three years while leveraging Cassandra (and Hector) in large-scale enterprise-grade deployments. We will focus on three specific areas, in which we identified consistent best practices & design patterns: data model optimization as a result of exporting data from HDFS/Hive/Shark into Cassandra through Spark/Hadoop MR jobs under Mesos with throttling, instrumentation and resilience features, automatically publishing geo-replicated, instrumented and monitored REST API's on top of the exported Cassandra data, and lessons learned from running Cassandra at scale from 0.6 to 2.0.6, including performance tuning, and tips and tricks. You will see live demos of our Publish to NoSql tools (Spark/Shark, Mesos, Hive, Cassandra ), a dashboard application built on top of generated data apis (D3.js, Cassandra) and xPatterns' monitoring and instrumentation consoles (Graphite, Ganglia, Nagios).
A good data model is key to getting the best performance from Apache Cassandra. The Log Structured Storage Engine and it's distributed architecture mean we cannot rely on a paradigm such as Normal Form to evaluate a model. Instead we need to design data models that support the read path of the application. In this talk Aaron Morton will walk through the key principles and patterns of a good CQL3 data model using simple examples.
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSDataStax Academy
Presenter: Antonio Alcocer, Big Data Architect at Stratio
Telefonica is the incumbent telecommunications network operator in Spain and the fourth one in capitalisation in the world. Cyber security is one of our most successful businesses worldwide. We provide monitoring and protecting clients from attacks. We analyze millions of data from multiple sources including social media, DNS records, and underground internet, to generate alerts and security reports for our clients. This use case required a Big Data component capable of processing the data and extract its information in real-time; warnings and alerts are time-sensitive in order to deal efficiently with security attacks. Our original architecture was the typical one used for data fusion systems. It included several collectors, a processing layer based on legacy systems, and a data store. The initial setup included a MongoDB database and an ad-hoc application. This solution however proved to be unfit for the specific purpose of dispatching alerts. We proposed to use Cassandra and Spark instead. This approach did manage to fulfill our original specifications as intended. Our talk will explain the reasons why we migrated the architecture and how the adopted solution based on Spark and Cassandra solved our problem.
You've researched. You've discussed. You've had (multiple) meetings. You've installed. You've tested (hopefully). You've have decided. Now what (besides having attended a Cassandra Day)? What else are you going to need to put that Cassandra cluster into beta? Our evangelist team will give you the Cliff Notes to make that next step go as smooth as.... well... as smooth as it can be!
Presenter: Chris Lohfink, Engineer at Pythian
This session will cover a walk-through to provide an understanding of key metrics critical to operating a Cassandra cluster effectively. Without context to the metrics, we just have pretty graphs. With context, we have a powerful tool to determine problems before they happen and to debug production issues more quickly.
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...Restlet
Lessons learned by Restlet when deploying DataStax Enterprise search with APISpark. Presentation by Jerome Louvel and Guillaume Blondeau at the Cassandra Summit 2015. Includes 7 challenges and solutions when deploying DataStax.
CON6423: Scalable JavaScript applications with Project NashornMichel Graciano
In the age of cloud computing and highly demanding systems, some new approaches for application architectures such as the event-driven model have been proposed and successfully implemented with Node.js. With the Nashorn JavaScript engine, it is possible to run JavaScript applications directly in the JVM, enabling access to the latest Node.js frameworks while taking advantage of the Java platform’s scalability, manageability, tools, and extensive collection of Java libraries and middleware. This session demonstrates how to use Nashorn to create highly scalable JavaScript applications leveraging the full power of the JVM by using the projects Avatar and Node.js with Avatar.js and Vert.x, highlighting their key benefits, issues, and challenges.
Powering GIS Application with PostgreSQL and Postgres Plus Ashnikbiz
This document provides an overview of Postgres Plus Advanced Server and its features. It begins with introductions to PostgreSQL and PostGIS. It then discusses Postgres Plus Advanced Server's Oracle compatibility, performance enhancements, security features, high availability options, database administration tools, and migration toolkit. The document also provides information on scaling Postgres Plus Advanced Server through partitioning and infinite cache technologies. It concludes with summaries of the replication capabilities of Postgres Plus Advanced Server.
Technical Introduction to PostgreSQL and PPASAshnikbiz
Let's take a look at:
PostgreSQL and buzz it has created
Architecture
Oracle Compatibility
Performance Feature
Security Features
High Availability Features
DBA Tools
User Stories
What’s coming up in v9.3
How to start adopting
This document compares the two major open source databases: MySQL and PostgreSQL. It provides a brief history of each database's development. MySQL prioritized ease-of-use and performance early on, while PostgreSQL focused on features, security, and standards compliance. More recently, both databases have expanded their feature sets. The document discusses the most common uses, features, and performance of each database. It concludes that for simple queries on 2-core machines, MySQL may perform better, while PostgreSQL tends to perform better for complex queries that can leverage multiple CPU cores.
Oracle database connection with the .net developersveerendramb3
Oracle Database 11g provides improved integration with Windows and .NET development. Key highlights include enhanced performance when running Oracle Database on Windows, easier development using Visual Studio tools, and unified management of Oracle and Microsoft servers.
This document discusses PingCAP's Kubernetes operator for TiDB, an open source distributed SQL database. It provides a brief history of PingCAP and the TiDB community. It then gives a technical overview of TiDB's architecture before explaining how the TiDB operator works. The operator allows users to deploy and manage TiDB clusters on Kubernetes through custom resources that are controlled by custom controllers. This provides capabilities like automated scaling, updates, and failover for stateful applications running on Kubernetes. The operator is open source and TiDB is also available as a managed service on GCP Marketplace.
This document provides an overview of Postgresql, including its history, capabilities, advantages over other databases, best practices, and references for further learning. Postgresql is an open source relational database management system that has been in development for over 30 years. It offers rich SQL support, high performance, ACID transactions, and extensive extensibility through features like JSON, XML, and programming languages.
MySQL 8.0 includes several new features such as a document store for JSON documents, improved replication of JSON documents, and support for Node.js. It focuses on improving performance, security, and capabilities for JSON and NoSQL features while maintaining compatibility with existing SQL features. MySQL 8.0 was in development for 2 years with over 5000 bugs fixed.
This document provides information about Oracle Developer Tools for Visual Studio .NET and Oracle Database Extensions for .NET. It states that the information presented is for informational purposes only and should not be relied upon for purchasing decisions or incorporated into any contract. The document outlines the product direction and includes demonstrations of the tools and extensions.
This document discusses different cloud platforms for hosting Grails applications. It provides an overview of infrastructure as a service (IaaS) models like Amazon EC2 and shared/dedicated virtual private servers, as well as platform as a service (PaaS) options including Amazon Beanstalk, Google App Engine, Heroku, Cloud Foundry, and Jelastic. A comparison chart evaluates these platforms based on factors such as pricing, control, reliability, and scalability. The document emphasizes that competition and changes in the cloud space are rapid and recommends keeping applications loosely coupled and testing platforms using free trials.
First steps into developing an application as a suite of small services, and analysis of tools and architecture approaches to be used.
Topics covered:
1) What is a micro service architecture
2)Advantages in code procedures, team dynamics and scaling
3) How container services such as docker assist in its implementation
4) How to deploy code in a micro services architecture
5) Container Management tools and resource efficiency (mesos, kubernetes, aws container service)
6) Scaling up
By PeoplePerHour team
presented by CTO Spyros Lambrinidis & Senior DevOps Panagiotis Moustafellos @ Docker Athens Meetup 18/02/2015
This document discusses using Ansible for infrastructure automation. It provides examples of how Ansible can be used for provisioning infrastructure, configuring servers, patching, backups, cluster deployment, and scaling. It also gives three use cases: creating a platform for a client, integrating Ansible with other tools like vRA and CyberArk, and automating a two year project involving RedHat and Windows systems. It concludes by discussing common problems providing "DevOps as a service" and introducing Crevise PowerOps to address these.
Tips to drive maria db cluster performance for nextcloudSeveralnines
200
● SSD
2000
● NVMe
4000
Tune for your hardware. Higher is better but avoid over-committing IOPS.
innodb_flush_log_at_trx_commit 1 Flush logs at each transaction commit for ACID compliance.
innodb_log_buffer_size 16M-64M Default is 8M. Increase for more transactions per second.
innodb_log_file_size 1G Default is 48M. Increase for more transactions per second.
innodb_flush_method O_DIRECT Bypass OS cache for better durability.
innodb_thread_concurrency 0 Allow InnoDB to manage thread concurrency level.
The Oracle Corporation is an American global computer technology corporation founded in 1977. It primarily develops and markets database management systems and enterprise software. In 2013, Oracle began using Oracle 12C which provided cloud services capabilities. In 2014, Oracle acquired digital marketing company Datalogix for an undisclosed amount.
Trove provides database services and improved in Kilo and Liberty releases. In Kilo, specs were moved to Gerrit, replication was improved with GTID and failover support, and new databases like CouchDB and Vertica were added. In Liberty, backups/restores were added for MongoDB and Redis, Redis was updated, and clustering support improved including multi-master MySQL and Redis clusters. Work also focused on simplifying operations and exposing more management features through Horizon.
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
This document discusses using Docker containers to run Cassandra clusters at Walmart. It proposes transforming existing Cassandra hardware into containers to better utilize unused compute. It also suggests building new Cassandra clusters in containers and migrating old clusters to double capacity on existing hardware and save costs. Benchmark results show Docker containers outperforming virtual machines on OpenStack and Azure in terms of reads, writes, throughput and latency for an in-house application.
The document discusses the evolution of Cassandra's data modeling capabilities over different versions of CQL. It covers features introduced in each version such as user defined types, functions, aggregates, materialized views, and storage attached secondary indexes (SASI). It provides examples of how to create user defined types, functions, materialized views, and SASI indexes in CQL. It also discusses when each feature should and should not be used.
Cisco has a large global IT infrastructure supporting many applications, databases, and employees. The document discusses Cisco's existing customer service and commerce systems (CSCC/SMS3) and some of the performance, scalability, and user experience issues. It then presents a proposed new architecture using modern technologies like Elasticsearch, Cassandra, and microservices to address these issues and improve agility, performance, scalability, uptime, and the user interface.
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
This document promotes Datastax Academy and Certification resources for learning Cassandra including a three step process of learning Cassandra, getting certified, and profiting. It lists community evangelists like Luke Tillman, Patrick McFadin, Jon Haddad, and Duy Hai Doan who can provide help and resources.
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
This document summarizes three presentations from a Cassandra Meetup:
1. Jason Cacciatore discussed monitoring Cassandra health at scale across hundreds of clusters and thousands of nodes using the reactive stream processing system Mantis.
2. Minh Do explained how Cassandra uses the gossip protocol for tasks like discovering cluster topology and sharing load information. Gossip also has limitations and race conditions that can cause problems.
3. Chris Kalantzis presented Cassandra Tickler, an open source tool he created to help repair operations that get stuck by running lightweight consistency checks on an old Cassandra version or a node with space issues.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
The document discusses Cassandra's use by Sony Network Entertainment to handle the large amount of user and transaction data from the growing PlayStation Network. It describes how the relational database they previously used did not scale sufficiently, so they transitioned to using Cassandra in a denormalized and customized way. Some of the techniques discussed include caching user data locally on application servers, secondary indexing, and using a real-time indexer to enable personalized search by friends.
This document provides guidance on setting up server monitoring, application metrics, log aggregation, time synchronization, replication strategies, and garbage collection for a Cassandra cluster. Key recommendations include:
1. Use monitoring tools like Monit, Munin, Nagios, or OpsCenter to monitor processes, disk usage, and system performance. Aggregate all logs centrally with tools like Splunk, Logstash, or Greylog.
2. Install NTP to synchronize server times which are critical for consistency.
3. Use the NetworkTopologyStrategy replication strategy and avoid SimpleStrategy for production.
4. Avoid shared storage and focus on low latency and high throughput using multiple local disks.
5. Understand
Introduction to Data Modeling with Apache CassandraDataStax Academy
This document provides an introduction to data modeling with Apache Cassandra. It discusses how Cassandra data models are designed based on the queries an application will perform, unlike relational databases which are designed based on normalization rules. Key aspects covered include avoiding joins by denormalizing data, using a partition key to group related data on nodes, and controlling the clustering order of columns. The document provides examples of modeling time series and tag data in Cassandra.
The document discusses different data storage options for small, medium, and large datasets. It argues that relational databases do not scale well for large datasets due to limitations with replication, normalization, sharding, and high availability. The document then introduces Apache Cassandra as a fast, distributed, highly available, and linearly scalable database that addresses these limitations through its use of a hash ring architecture and tunable consistency levels. It describes Cassandra's key features including replication, compaction, and multi-datacenter support.
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
This document provides an overview of using Datastax Enterprise (DSE) Search to enable full-text search capabilities in Cassandra applications. It discusses how DSE Search integrates Solr/Lucene indexing with the Cassandra database to allow searching of application data without requiring a separate search cluster, external ETL processes, or custom application code for data management. The document also includes examples of different types of searches that can be performed, such as filtering, faceting, geospatial searches, and joins. It concludes with basic steps for getting started with DSE Search such as creating a Solr core and executing search queries using CQL.
The document discusses common bad habits that can occur when working with Apache Cassandra and provides recommendations to avoid them. Specifically, it addresses issues like sliding back into a relational mindset when the data model is different, improperly benchmarking Cassandra systems, having slow client performance, and neglecting important operations tasks. The presentation provides guidance on how to approach data modeling, querying, benchmarking, driver usage, and operations management in a Cassandra-oriented way.
This document provides an overview and examples of modeling data in Apache Cassandra. It begins with an introduction to thinking about data models and queries before modeling, and emphasizes that Cassandra requires modeling around queries due to its limitations on joins and indexes. The document then provides examples of modeling user, video, and other entity data for a video sharing application to support common queries. It also discusses techniques for handling queries that could become hotspots, such as bucketing or adding random values. The examples illustrate best practices for data duplication, materialized views, and time series data storage in Cassandra.
The document discusses best practices for using Apache Cassandra, including:
- Topology considerations like replication strategies and snitches
- Booting new datacenters and replacing nodes
- Security techniques like authentication, authorization, and SSL encryption
- Using prepared statements for efficiency
- Asynchronous execution for request pipelining
- Batch statements and their appropriate uses
- Improving performance through techniques like the new row cache
This is a two part talk in which we'll go over the architecture that enables Apache Cassandra’s linear scalability as well as how DataStax Drivers are able to take full advantage of it to provide developers with nicely designed and speedy clients extendable to the core.
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma
My talk for the Indian School of Business (ISB) Emerging Leaders Program Cohort 9. In this talk, I discussed key issues around adoption of GenAI in business - benefits, opportunities and limitations. I also discussed how my research on Theory of Cognitive Chasms helps address some of these issues
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In Francechb3
Restlet: Building a multi-tenant API PaaS with DataStax Enterprise Search
1. Building a multi-tenant API PaaS with
DataStax Enterprise Search
Presented by Jerome Louvel, Chief Geek
& Guillaume Blondeau, Technical Detective
2. 1. Introduction
2. Persistence needs of an API PaaS
3. Selecting DataStax Enterprise Search
4. Main challenges and solutions
5. Conclusion
6. Q&A
Agenda
4. ● Jérôme Louvel
○ founder & CTO of Restlet, Web API platform vendor
○ created Restlet Framework, first REST framework in 2004
○ contributor to “RESTful Web Services” (O’Reilly, 2007)
○ member of the JAX-RS 1.0 expert group (2007 - 2009)
○ co-author of “Restlet in Action” (Manning, 2012)
○ InfoQ editor covering Web APIs since 2014
● Guillaume Blondeau
○ DevOps engineer at Restlet
○ working on APISpark cloud platform
○ Cassandra Administrator certified by DataStax
About the Speakers
6. ● Key features
○ visual creation & deployment of
data APIs
○ operation of APIs &
their local data sources
○ management of any API
● Benefits
○ accessible via web browser,
no technical expertise required
○ companies of any size can
become API providers
○ get started for free, then pay
when the API generates traffic
About APISpark
10. High Scalability & Elasticity
● For API traffic
○ concurrent calls
○ workload types
○ peaks handling
● For data storage
○ number of stores
○ volume of data ...
...
...
...
12. High Multi-tenant Density
● Balance between
○ data isolation
○ low cost
● Many customers & projects
○ sharing persistence
infrastructure
○ isolated data stores
● Many users & groups
○ personal data
○ shared group data
14. Step 1: Prototyping with AWS NoSQL
● Started with SimpleDB
○ zero ops, highly available & low latency
○ mono-region & limited query capabilities
● Upgraded to DynamoDB
○ better scalability & predictability
○ not really for multi-tenant use cases (soft limits)
○ not very elastic (provisioned throughput)
● Other limitations
○ unable to develop and test locally (MySQL mode)
○ strong AWS lock-in
15. Step 2: Moving to Apache Cassandra
● For APISpark beta version
○ increasing multi-tenancy needs
○ increasing cost concerns
● Benefits
○ fully open source & free (vendor support)
○ on-premise deployments possible
○ proven scalability on AWS (Netflix)
○ richer query capabilities
○ natively multi-region
16. Step 3: Upgrading to DataStax Enterprise
● For APISpark GA
○ DataStax certified stack
○ production support
● Improved capabilities
○ much richer query capabilities with Solr integration
○ administration console
○ command line tooling
○ comprehensive documentation
● Still open source foundation
○ limited vendor lock-in
○ mature open source components
19. ● Using Ec2MultiRegionSnitch
● 1 Entity Store = 1 Keyspace
○ Each keyspace can set its own replication policy
I. Deploying Across Multiple Regions
20. ● 1 Entity Store = 1 Keyspace
○ Data isolated in File System and Memory
● Complementary benefit
○ ACL per keyspace
II. Isolating Customer Data & Keeping Cost Low
Keyspace
Table
22. IV. Dealing with Dynamic Schema Changes (1/3)
ALTER TABLE DROP
ALTER TABLE ADD
23. IV. Dealing with Dynamic Schema Changes (2/3)
User Action on Entity Store Action performed in DB
Create Entity CQL: “CREATE TABLE <tableName>” + Solr Core creation
Delete Entity CQL: “DROP TABLE <tableName>”
Create Property
CQL: “ALTER TABLE ADD <columnName> <type>” +
Solr Core schema update
Delete Property
CQL: “ALTER TABLE DROP <columnName>” +
Solr Core schema update
Add Property in composite Java: Alter JSON for all rows
Delete Property in composite Java: Alter JSON for all rows
24. ● Advantages
○ flexibility compared to RDBMS
■ no lock
○ available actions
■ add / drop / rename column
■ change type of column
● Limitations
○ schema deployment can take time
○ in some edge cases can’t recreate columns
IV. Dealing with Dynamic Schema Changes (3/3)
25. V. High Multi-tenant Density (1/2)
Schema deployment time with growing # of tables
26. ● Challenge
○ large number of C* tables & Solr cores
○ memory usage (ex: 1 C* table takes more than 1MB of heap)
● Solutions
○ adjust JVM memory settings
○ need to create additional clusters
○ deprovision unused Entity Stores
V. High Multi-tenant Density (2/2)
32. ● Special use case of DataStax Enterprise
○ not a lot of shared knowledge about it
○ great support from DataStax
○ DSE is a good fit despite some challenges
● Looking forward to DSE 4.8 !
○ User Defined Types with Solr indexing
○ live indexing of C* data into Solr
○ improved overall performance
Conclusion