The columnar roadmap: Apache Parquet and Apache ArrowJulien Le Dem
This document discusses Apache Parquet and Apache Arrow, open source projects for columnar data formats. Parquet is an on-disk columnar format that optimizes I/O performance through compression and projection pushdown. Arrow is an in-memory columnar format that maximizes CPU efficiency through vectorized processing and SIMD. It aims to serve as a standard in-memory format between systems. The document outlines how Arrow builds on Parquet's success and provides benefits like reduced serialization overhead and ability to share functionality through its ecosystem. It also describes how Parquet and Arrow representations are integrated through techniques like vectorized reading and predicate pushdown.
ArangoDB is an open source multi-model NoSQL database that can be used as a document store, key-value store, and graph database. It provides a query language called AQL that is similar to SQL. Documents and data can be easily extended and manipulated using JavaScript. ArangoDB is highly performant, space efficient, and can scale horizontally. It has been in development since 2011 with the goal of providing a full-featured database while avoiding the downsides of other NoSQL solutions.
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
This document provides an overview of building a real-time analytics application with Apache Pulsar and Apache Pinot. It introduces Mary Grygleski and Mark Needham, describes what real-time analytics is, and discusses the properties of real-time analytics systems. It then demonstrates how to ingest data from the Wikimedia recent changes feed into Pulsar and Pinot for real-time analytics and builds a dashboard with the data using Streamlit.
A practical introduction to Apache Solr.
Slides for NeoCom 2020 days at University of Zaragoza.
https://eina.unizar.es/noticias/neocom-2020
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksMichelle Ufford
Slides from JupyterCon 2018 in NYC on 8/23/2018.
Notebooks have moved beyond a niche solution at Netflix; they are now the critical path for how everyone runs jobs against the company’s data platform. From creating original content to delivering bufferless streaming, Netflix relies on notebooks to inform decisions and fuel experiments across the company. Netflix also uses notebooks to power its machine learning infrastructure and run over 150,000 jobs against its 100 PB cloud-based data warehouse every day. The goal is to deliver a compelling notebooks experience that simplifies end-to-end workflows for every type of user. To enable this, Netflix is investing deeply in notebook infrastructure and open source projects such as nteract.
In this talk, Michelle Ufford and Kyle Kelley share interesting ways Netflix uses data and some of the big bets the company is making on notebooks. Topics will include architecture, kernels, UIs, and Netflix’s open source collaborations with projects such as Jupyter, nteract, pandas, and Spark.
Centralized Logging System Using ELK StackRohit Sharma
Centralized Logging System using ELK Stack
The document discusses setting up a centralized logging system (CLS) using the ELK stack. The ELK stack consists of Logstash to capture and filter logs, Elasticsearch to index and store logs, and Kibana to visualize logs. Logstash agents on each server ship logs to Logstash, which filters and sends logs to Elasticsearch for indexing. Kibana queries Elasticsearch and presents logs through interactive dashboards. A CLS provides benefits like log analysis, auditing, compliance, and a single point of control. The ELK stack is an open-source solution that is scalable, customizable, and integrates with other tools.
Ambari Views provide a common user experience framework for Hadoop ecosystem components. Views can be contributed as plugins to embed interfaces in Ambari for operators, data workers, and others. The goals of Views are to provide a single point of entry through a common URL, a pluggable UI framework where Views can be shared, and centralized authorization. Example Views include queue managers, resource utilization dashboards, and query editors.
Kibana is a highly customizable dashboarding and visualization platform that allows users to perform flexible analytics and visualization of real-time streaming data through an intuitive interface. It allows users to easily create various chart types like bar charts, line plots, scatter plots, histograms, and pie charts to better understand large volumes of data. Some of its key features include customizable dashboards with components like time pickers, queries, filters, charts and tables for sharing and embedding insights.
GlusterFs Architecture & Roadmap - LinuxCon EU 2013Gluster.org
GlusterFS is a scale-out distributed file system that aggregates storage over a network to provide a single unified namespace. It has a modular architecture and runs on commodity hardware without external metadata servers. Future directions for GlusterFS include distributed geo-replication, file snapshots, and erasure coding support. Challenges include improving scalability, supporting hard links and renames, reducing monitoring overhead, and lowering costs.
Efficient Spark Analytics on Encrypted Data with Gidon GershinskyDatabricks
This document discusses efficient Spark analytics on encrypted data using Parquet modular encryption. It provides an overview of the problem of protecting sensitive data at rest while preserving analytics performance. It then describes Parquet modular encryption which enables columnar projection, predicate pushdown and fine-grained access control on encrypted Parquet data. Finally, it demonstrates a connected car use case and shows the performance implications of encryption on Spark analytics are minimal.
Getting Microservices and Legacy to Play Nicely Together with Event-Driven Ar...VMware Tanzu
The document discusses techniques for modernizing legacy systems, including digital decoupling. Digital decoupling aims to isolate legacy systems to break the cycle of increasing costs when adding features. It unlocks constrained legacy data and delivers new business value on a modern cloud architecture. This allows replacing the core systems over time while continuously delivering value. The key techniques discussed are using microservices, change data capture, event-driven architectures, and domain-driven design to begin digitally decoupling legacy systems.
Presented the "A Cloud Journey - Move to the Oracle Cloud" on behalf of Ricardo Gonzalez during Bulgarian Oracle User Group Spring Conference 2019. This presentation discusses various methods on how to migrate to the Oracle Cloud and provides recommendations as to which tool to use (and where to find it) especially assuming that Zero Downtime Migration is desired, for which the new Zero Downtime Migration tool is described and discussed in detail. More information: http://www.oracle.com/goto/move
MySQL for Oracle Developers and the companion MySQL for Oracle DBA's were two presentations for the 2006 MySQL Conference and Expo. These were specifically designed for Oracle resources to understand the usage, syntax and differences between MySQL and Oracle.
Real-time Data Streaming from Oracle to Apache Kafka confluent
Dbvisit is a New Zealand-based company with offices worldwide that provides software to replicate data from Oracle databases in real-time to Apache Kafka. Their Dbvisit Replicate Connector is a plugin for Kafka Connect that allows minimal impact replication of database table changes to Kafka topics. The connector also generates metadata topics. Dbvisit focuses only on Oracle databases and replication, has proprietary log mining technology, and supports Oracle back to version 9.2. They have over 1,300 customers globally and offer perpetual or term licensing models for their replication software along with support plans. Dbvisit is a good fit for organizations using Oracle that want to offload reporting, enable real-time analytics, and integrate data into Kafka in a cost-effective manner
Kibana is a data visualization tool that is part of the ELK stack (Elasticsearch, Logstash, Kibana) and allows users to search, analyze, and visualize data stored in Elasticsearch. The document discusses Kibana's essential features including Discover to query data, Visualize to create visualizations, and Dashboard to combine them. It also covers additional tools like Dev Tools, X-Pack plugins, and Machine Learning capabilities.
BW Migration to HANA Part 2 - SUM DMO Tool for SAP Upgrade & MigrationLinh Nguyen
This series of publication intends to provide an overview and explanation of major steps and considerations for BW on HANA migrations from anyDB (any database). The complex procedure involves:
1) Preparatory work in the BW system
2) SUM DMO Upgrade and Actual migration
3) Post processing on the migrated systems
This first part focuses on the SUM DMO tool used for the migration, pre-requisites, optimization and the actual migration steps
By OZSoft Consulting for ITConductor.com
Author: Terry Kempis
Editor: Linh Nguyen
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017Alex Robinson
Until recently, developers have had to deal with some serious tradeoffs when picking a database technology. One could pick a SQL database and deal with their eventual scaling problems or pick a NoSQL database and have to work around their lack of transactions, strong consistency, and/or secondary indexes. However, a new class of distributed database engines is emerging that combines the transactional consistency guarantees of traditional relational databases with the horizontal scalability and high availability of popular NoSQL databases.
In this talk, we'll examine the history of databases to see how we got here, covering the motivations for this new class of systems and why developers should care about them. We'll then take a deep dive into the key design choices behind one open source distributed SQL database, CockroachDB, that enable it to offer such properties and compare them to past SQL and NoSQL designs. We will look specifically at how to achieve the easy deployment and management of a scalable, self-healing, strongly-consistent database with techniques such as dynamic sharding and rebalancing, consensus protocols, lock-free transactions, and more.
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
From DataEngConf 2017 - Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.
This document provides an overview of large scale graph analytics and JanusGraph. It discusses graph databases and their use cases. JanusGraph is presented as an open source graph database that can scale to billions of vertices and edges across multiple storage backends like HBase, Cassandra and Bigtable. It uses the TinkerPop framework and Gremlin query language. JanusGraph supports ACID transactions, external indices, and evolving schemas. Example graph queries are demonstrated using the Gremlin console.
The document discusses various components of the ELK stack including Elasticsearch, Logstash, Kibana, and how they work together. It provides descriptions of each component, what they are used for, and key features of Kibana such as its user interface, visualization capabilities, and why it is used.
Kafka Streams State Stores Being Persistentconfluent
This document discusses Kafka Streams state stores. It provides examples of using different types of windowing (tumbling, hopping, sliding, session) with state stores. It also covers configuring state store logging, caching, and retention policies. The document demonstrates how to define windowed state stores in Kafka Streams applications and discusses concepts like grace periods.
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka.
Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.
The document provides instructions for configuring single sign-on (SSO) with an SAP HANA database using Kerberos authentication and Microsoft Active Directory. It describes the necessary steps to set up hostname resolution, configure the SAP HANA database server for Kerberos, create an SAP HANA service user in Active Directory, generate a keytab file, create external SAP HANA database users, and verify the SSO configuration. Troubleshooting tips are provided in an appendix. The goal is to enable users to authenticate with the SAP HANA database after logging into the Active Directory domain, without needing to re-enter credentials.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Big Data! Great! Now What? #SymfonyCon 2014Ricard Clau
Big Data is one of the new buzzwords in the industry. Everyone is using NoSQL databases. MySQL is not cool anymore. But... do we really have big data? Where should we store it? Are the traditional RDBMS databases dead? Is NoSQL the solution to our problems? And most importantly, how can PHP and Symfony2 help with it?
Ambari Views provide a common user experience framework for Hadoop ecosystem components. Views can be contributed as plugins to embed interfaces in Ambari for operators, data workers, and others. The goals of Views are to provide a single point of entry through a common URL, a pluggable UI framework where Views can be shared, and centralized authorization. Example Views include queue managers, resource utilization dashboards, and query editors.
Kibana is a highly customizable dashboarding and visualization platform that allows users to perform flexible analytics and visualization of real-time streaming data through an intuitive interface. It allows users to easily create various chart types like bar charts, line plots, scatter plots, histograms, and pie charts to better understand large volumes of data. Some of its key features include customizable dashboards with components like time pickers, queries, filters, charts and tables for sharing and embedding insights.
GlusterFs Architecture & Roadmap - LinuxCon EU 2013Gluster.org
GlusterFS is a scale-out distributed file system that aggregates storage over a network to provide a single unified namespace. It has a modular architecture and runs on commodity hardware without external metadata servers. Future directions for GlusterFS include distributed geo-replication, file snapshots, and erasure coding support. Challenges include improving scalability, supporting hard links and renames, reducing monitoring overhead, and lowering costs.
Efficient Spark Analytics on Encrypted Data with Gidon GershinskyDatabricks
This document discusses efficient Spark analytics on encrypted data using Parquet modular encryption. It provides an overview of the problem of protecting sensitive data at rest while preserving analytics performance. It then describes Parquet modular encryption which enables columnar projection, predicate pushdown and fine-grained access control on encrypted Parquet data. Finally, it demonstrates a connected car use case and shows the performance implications of encryption on Spark analytics are minimal.
Getting Microservices and Legacy to Play Nicely Together with Event-Driven Ar...VMware Tanzu
The document discusses techniques for modernizing legacy systems, including digital decoupling. Digital decoupling aims to isolate legacy systems to break the cycle of increasing costs when adding features. It unlocks constrained legacy data and delivers new business value on a modern cloud architecture. This allows replacing the core systems over time while continuously delivering value. The key techniques discussed are using microservices, change data capture, event-driven architectures, and domain-driven design to begin digitally decoupling legacy systems.
Presented the "A Cloud Journey - Move to the Oracle Cloud" on behalf of Ricardo Gonzalez during Bulgarian Oracle User Group Spring Conference 2019. This presentation discusses various methods on how to migrate to the Oracle Cloud and provides recommendations as to which tool to use (and where to find it) especially assuming that Zero Downtime Migration is desired, for which the new Zero Downtime Migration tool is described and discussed in detail. More information: http://www.oracle.com/goto/move
MySQL for Oracle Developers and the companion MySQL for Oracle DBA's were two presentations for the 2006 MySQL Conference and Expo. These were specifically designed for Oracle resources to understand the usage, syntax and differences between MySQL and Oracle.
Real-time Data Streaming from Oracle to Apache Kafka confluent
Dbvisit is a New Zealand-based company with offices worldwide that provides software to replicate data from Oracle databases in real-time to Apache Kafka. Their Dbvisit Replicate Connector is a plugin for Kafka Connect that allows minimal impact replication of database table changes to Kafka topics. The connector also generates metadata topics. Dbvisit focuses only on Oracle databases and replication, has proprietary log mining technology, and supports Oracle back to version 9.2. They have over 1,300 customers globally and offer perpetual or term licensing models for their replication software along with support plans. Dbvisit is a good fit for organizations using Oracle that want to offload reporting, enable real-time analytics, and integrate data into Kafka in a cost-effective manner
Kibana is a data visualization tool that is part of the ELK stack (Elasticsearch, Logstash, Kibana) and allows users to search, analyze, and visualize data stored in Elasticsearch. The document discusses Kibana's essential features including Discover to query data, Visualize to create visualizations, and Dashboard to combine them. It also covers additional tools like Dev Tools, X-Pack plugins, and Machine Learning capabilities.
BW Migration to HANA Part 2 - SUM DMO Tool for SAP Upgrade & MigrationLinh Nguyen
This series of publication intends to provide an overview and explanation of major steps and considerations for BW on HANA migrations from anyDB (any database). The complex procedure involves:
1) Preparatory work in the BW system
2) SUM DMO Upgrade and Actual migration
3) Post processing on the migrated systems
This first part focuses on the SUM DMO tool used for the migration, pre-requisites, optimization and the actual migration steps
By OZSoft Consulting for ITConductor.com
Author: Terry Kempis
Editor: Linh Nguyen
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017Alex Robinson
Until recently, developers have had to deal with some serious tradeoffs when picking a database technology. One could pick a SQL database and deal with their eventual scaling problems or pick a NoSQL database and have to work around their lack of transactions, strong consistency, and/or secondary indexes. However, a new class of distributed database engines is emerging that combines the transactional consistency guarantees of traditional relational databases with the horizontal scalability and high availability of popular NoSQL databases.
In this talk, we'll examine the history of databases to see how we got here, covering the motivations for this new class of systems and why developers should care about them. We'll then take a deep dive into the key design choices behind one open source distributed SQL database, CockroachDB, that enable it to offer such properties and compare them to past SQL and NoSQL designs. We will look specifically at how to achieve the easy deployment and management of a scalable, self-healing, strongly-consistent database with techniques such as dynamic sharding and rebalancing, consensus protocols, lock-free transactions, and more.
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
From DataEngConf 2017 - Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.
This document provides an overview of large scale graph analytics and JanusGraph. It discusses graph databases and their use cases. JanusGraph is presented as an open source graph database that can scale to billions of vertices and edges across multiple storage backends like HBase, Cassandra and Bigtable. It uses the TinkerPop framework and Gremlin query language. JanusGraph supports ACID transactions, external indices, and evolving schemas. Example graph queries are demonstrated using the Gremlin console.
The document discusses various components of the ELK stack including Elasticsearch, Logstash, Kibana, and how they work together. It provides descriptions of each component, what they are used for, and key features of Kibana such as its user interface, visualization capabilities, and why it is used.
Kafka Streams State Stores Being Persistentconfluent
This document discusses Kafka Streams state stores. It provides examples of using different types of windowing (tumbling, hopping, sliding, session) with state stores. It also covers configuring state store logging, caching, and retention policies. The document demonstrates how to define windowed state stores in Kafka Streams applications and discusses concepts like grace periods.
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka.
Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.
The document provides instructions for configuring single sign-on (SSO) with an SAP HANA database using Kerberos authentication and Microsoft Active Directory. It describes the necessary steps to set up hostname resolution, configure the SAP HANA database server for Kerberos, create an SAP HANA service user in Active Directory, generate a keytab file, create external SAP HANA database users, and verify the SSO configuration. Troubleshooting tips are provided in an appendix. The goal is to enable users to authenticate with the SAP HANA database after logging into the Active Directory domain, without needing to re-enter credentials.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Big Data! Great! Now What? #SymfonyCon 2014Ricard Clau
Big Data is one of the new buzzwords in the industry. Everyone is using NoSQL databases. MySQL is not cool anymore. But... do we really have big data? Where should we store it? Are the traditional RDBMS databases dead? Is NoSQL the solution to our problems? And most importantly, how can PHP and Symfony2 help with it?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
Wordnik migrated from a MySQL relational database to the non-relational MongoDB database for 5 key reasons: speed, stability, scaling, simplicity, and fitting their object model better. They tested MongoDB extensively, iteratively improving their data mapping and access patterns. The migration was done without downtime by switching between the databases. While inserts were much faster in MongoDB, updates could be slow due to disk I/O. Wordnik addressed this through optimizations like pre-fetching on updates and moving to local storage. Overall, MongoDB was a better fit for Wordnik's large and evolving datasets.
NoSQL databases should not be chosen just because a system is slow or to replace RDBMS. The appropriate choice depends on factors like the nature of the data, how the data scales, and whether ACID properties are needed. NoSQL databases are categorized by data model (document, column family, graph, key-value store) which affects querying. Other considerations include scalability based on the CAP theorem and operational factors like the distribution model and whether there is a single point of failure. The best choice depends on the specific requirements and risks losing data if chosen incorrectly.
This document provides an overview and summary of key concepts related to advanced databases. It discusses relational databases including MySQL, SQL, transactions, and ODBC. It also covers database topics like triggers, indexes, and NoSQL databases. Alternative database systems like graph databases, triplestores, and linked data are introduced. Web services, XML, and data journalism are also briefly summarized. The document provides definitions and examples of these technical database terms and concepts.
Reuven Lerner's first talk from Open Ruby Day, at Hi-Tech College in Herzliya, Israel, on June 27th 2010. An overview of what makes Rails a powerful framework for Web development -- what attracted Reuven to it, what are the components that most speak to him, and why others should consider Rails for their Web applications.
- Data modeling for NoSQL databases is different than relational databases and requires designing the data model around access patterns rather than object structure. Key differences include not having joins so data needs to be duplicated and modeling the data in a way that works for querying, indexing, and retrieval speed.
- The data model should focus on making the most of features like atomic updates, inner indexes, and unique identifiers. It's also important to consider how data will be added, modified, and retrieved factoring in object complexity, marshalling/unmarshalling costs, and index maintenance.
- The _id field can be tailored to the access patterns, such as using dates for time-series data to keep recent
SortaSQL is a proposal to add seamless horizontal scalability to SQL databases by using the filesystem to store and retrieve data. The SQL database would store metadata and handle queries, while an embedded key-value store manages record storage on files in the local or distributed filesystem. This allows queries to scale across many servers by letting the filesystem handle replication, performance and locking of distributed data files. The architecture involves an application communicating with PostgreSQL over SQL, which uses a SortaSQL plugin to retrieve rows from Kyoto Cabinet key-value files on the POSIX filesystem. Case studies at CloudFlare show how a 400GB per day dataset can be efficiently stored and queried at scale using this approach.
The lightning talks covered various Netflix OSS projects including S3mper, PigPen, STAASH, Dynomite, Aegisthus, Suro, Zeno, Lipstick on GCE, AnsWerS, and IBM. 41 projects were discussed and the need for a cohesive Netflix OSS platform was highlighted. Matt Bookman then gave a presentation on running Lipstick and Hadoop on Google Cloud Platform using Google Compute Engine and Cloud Storage. He demonstrated running Pig jobs on Compute Engine and discussed design considerations for cloud-based Hadoop deployments. Finally, Peter Sankauskas from @Answers4AWS discussed initial ideas around CloudFormation for Asgard and deploying various Netflix OSS
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
This presentation gives an brief overview of the history of relational databases, ACID and SQL and presents some of the key strentgths and potential weaknesses. It introduces the rise of NoSQL - why it arose, what is entails, when to use it. The presentation focuses on MongoDB as prime example of NoSQL document store and it shows how to interact with MongoDB from JavaScript (NodeJS) and Java.
This document provides an overview of patterns for scalability, availability, and stability in distributed systems. It discusses general recommendations like immutability and referential transparency. It covers scalability trade-offs around performance vs scalability, latency vs throughput, and availability vs consistency. It then describes various patterns for scalability including managing state through partitioning, caching, sharding databases, and using distributed caching. It also covers patterns for managing behavior through event-driven architecture, compute grids, load balancing, and parallel computing. Availability patterns like fail-over, replication, and fault tolerance are discussed. The document provides examples of popular technologies that implement many of these patterns.
Slides for a talk.
Talk abstract:
In the dark of the night, if you listen carefully enough, you can hear databases cry. But why? As developers, we rarely consider what happens under the hood of widely used abstractions such as databases. As a consequence, we rarely think about the performance of databases. This is especially true to less widespread, but often very useful NoSQL databases.
In this talk we will take a close look at NoSQL database performance, peek under the hood of the most frequently used features to see how they affect performance and discuss performance issues and bottlenecks inherent to all databases.
This document discusses NoSQL databases for .NET developers. It begins with an introduction to NoSQL and why it is gaining popularity. It then covers the main types of NoSQL databases - document stores, key-value stores, graph databases, and object databases - and examples of databases for each type. It also discusses how .NET developers can interface with different NoSQL databases either through native .NET clients or REST APIs. The document concludes by noting that NoSQL is well-suited for cloud databases and provides an example of using AWS SimpleDB from .NET.
This document discusses how organizations will need to adapt their data infrastructure and software models as Moore's Law ends and data volumes continue growing exponentially. It outlines how traditional clustering, databases, and application servers will no longer scale to meet these new demands. New distributed, dynamically adaptive approaches like NoSQL data stores, functional programming, and eventual consistency models are needed. Hardware is also evolving to support exabyte storage, tens of thousands of CPU cores, and networked memory, requiring new software architectures.
Modern software architectures - PHP UK Conference 2015Ricard Clau
The web has changed. Users demand responsive, real-time interactive applications and companies need to store and analyze tons of data. Some years ago, monolithic code bases with a basic LAMP stack, some caching and perhaps a search engine were enough. These days everybody is talking about micro-services architectures, SOA, Erlang, Golang, message passing, queue systems and many more. PHP seems to not be cool anymore but... is this true? Should we all forget everything we know and just learn these new technologies? Do we really need all these things?
Why Node, Express and Postgres - presented 23 Feb 15, Talkjs, Microsoft Audit...Calvin Tan
How Node, Express and Postgres and help meet the challenges of building a scalable Web Service.
Node is event-oriented and able to take high load.
Express makes your code very simple and maintainable. Supports API-styled web service.
Postgres supports your data needs with a very flexible data structure.
This document provides a summary of Oracle OpenWorld 2014 discussions on database cloud, in-memory database, native JSON support, big data, and Internet of Things (IoT) technologies. Key points include:
- Database Cloud on Oracle offers pay-as-you-go pricing and self-service provisioning similar to on-premise databases.
- Oracle Database 12c includes an in-memory option that can provide up to 100x faster analytics queries and 2-4x faster transaction processing.
- Native JSON support in 12c allows storing and querying JSON documents within the database.
- Big data technologies like Oracle Big Data SQL and Oracle Big Data Discovery help analyze large and diverse data sets from sources like
From a student to an apache committer practice of apache io tdbjixuan1989
This talk is introduce by Xiangdong Huang, who is a PPMC of Apache IoTDB (incubating) project, at Apache Event at Tsinghua University in China.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
This document discusses quantifying the scalability of software. It recommends instrumenting code from the beginning to collect monitoring data on application health, the entire cluster, and individual nodes' system resources. This allows measuring how well a system can handle increasing load and evolving constraints.
This document summarizes Andreas Jung's presentation on the state of PrintCSS in 2023. It discusses the basics of PrintCSS, challenges in comparing different PrintCSS tools, an overview of free and commercial PrintCSS renderers, the role of JavaScript, common pain points, and decision criteria for choosing a PrintCSS renderer. The presentation provides an in-depth look at PrintCSS standards, tools, features, use cases, and recommendations.
PrintCSS W3C workshop at XMLPrague 2020Andreas Jung
1. Andreas Jung is a freelance consultant and developer who founded the print-css.rocks project in 2016 to provide vendor-neutral information about PrintCSS.
2. There are many incomplete and missing parts of the PrintCSS standard including issues with table splitting, floating, images, and support for JavaScript and multi-column layouts.
3. Key missing features from the standard include CSS exclusions, named page floating, hyphenation dictionaries, auto-sizing text to containers, consistent sidenote positioning, and tests to ensure consistent rendering behavior across tools.
Andreas Jung gives a presentation on PrintCSS, which uses CSS to control pagination and layout when converting XML or HTML to PDF. He discusses various PrintCSS tools and their features, provides examples of how PrintCSS is used, and highlights areas that still need improvement, such as standardization, JavaScript support, image positioning, and hyphenation. The ecosystem of PrintCSS tools is still limited with few free and open source options.
Plone 5.2 migration at University Ghent, BelgiumAndreas Jung
This talk summarizes our #Plone migration approach of the Plone installation at ugent.be. The migration process consists of the export of the original site to JSON using collective.jsonify, import of the data to ArangoDB and then back into a fresh Plone site through plone.restapi
This document discusses migrating 10 Plone sites from Plone 4.1/4.3 to Plone 5.1 using plone.restapi. The goals were a consistent look and feel, common code base with fewer dependencies, and consistent deployment. A custom provisioning API was built to handle site creation, content migration, and other tasks. The migration process extracted content from source sites and recreated it in the target Plone 5 sites using plone.api calls over HTTP. Most structures and content migrated automatically, with some manual work needed for default pages, collections, and other content. Lessons learned were that the approach was stable, reasonably fast, and could be adopted for other migrations.
Creating Content Together - Plone Integration with SMASHDOCsAndreas Jung
Plone Conference 2017 in Barcelona. Lightning talk .
Collaborative Content Creation solutions for content management systems or arbitrary web applications,
Creating Content Together - Plone Integration with SMASHDOCsAndreas Jung
Plone Conference 2017 in Barcelona. Lightning talk .
Collaborative Content Creation solutions for content management systems or arbitrary web applications,
Pyfilesystem provides a unified Python API for accessing various storage systems and file services. It abstracts away differences between storage APIs so that code works across systems without changes. Drivers exist for many systems including WebDAV, SFTP, S3, and local filesystems. The goal is for code to be unaware of the underlying storage type being used.
Building bridges - Plone Conference 2015 BucharestAndreas Jung
This document discusses integrative publishing solutions using Plone and external storage systems and document formats. It introduces the XML Director toolkit which provides unified access to external storages like S3, WebDAV, FTP through a common API. It allows mounting these storages in Plone and integrating them with Dexterity content. The document also discusses various document formats like DOCX, DITA, HTML, PDF, EPUB and tools for converting between these formats to support an XML-based publishing workflow in Plone.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including:
-Artificial Intelligence Market Overview
-Strategies for AI Adoption in 2025
-Anticipated drivers of AI adoption and transformative technologies
-Benefits of AI and Big data for your business
-Tips on how to prepare your business for innovation
-AI and data privacy: Strategies for securing data privacy in AI models, etc.
Download your free copy nowand implement the key findings to improve your business.
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPathCommunity
Join this UiPath Community Berlin meetup to explore the Orchestrator API, Swagger interface, and the Test Manager API. Learn how to leverage these tools to streamline automation, enhance testing, and integrate more efficiently with UiPath. Perfect for developers, testers, and automation enthusiasts!
📕 Agenda
Welcome & Introductions
Orchestrator API Overview
Exploring the Swagger Interface
Test Manager API Highlights
Streamlining Automation & Testing with APIs (Demo)
Q&A and Open Discussion
Perfect for developers, testers, and automation enthusiasts!
👉 Join our UiPath Community Berlin chapter: https://community.uipath.com/berlin/
This session streamed live on April 29, 2025, 18:00 CET.
Check out all our upcoming UiPath Community sessions at https://community.uipath.com/events/.
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
2. Why we ♥
/about
• Python developer since 1993
• Freelancer since 2004
• Python, Zope, Plone …
• individual software development
• Electronic Publishing
(Publishing workflows DOCX→XML→PDF | EPUB | HTML,
XML consulting)
• Founded publishing projects
• XML-Director
• Produce & Publish
3. Why we ♥
Disclaimer
• This talk is completely
• biased
• opinionated
• unscientific
• not affiliated with ArangoDB GmbH
4. Why we ♥
Relational databases
• well understood
• common data model
• long history:
• System R (1974)
• Oracle (1979)
• Structured Query Language (Standards: ISO/IEC 9075 + 13249)
• theoratically interoperable if you stick to the SQL standard
7. Why we ♥
“NoSQL is not about performance, scaling, dropping
ACID or hating SQL — it is about choice. As NoSQL
databases are somewhat different it does not help very
much to compare the databases by their throughput and
chose the one which is faster. Instead—the user should
carefully think about his overall requirements and
weight the different aspects. Massively scalable key/value
stores or memory-only systems can archive much higher
benchmarks. But your aim is to provide a much more
convenient system for a broader range of use-cases—
which is fast enough for almost all cases.”
Jan Lenhardt (CouchDB)
9. Why we ♥
New challenges
• Cloud
• Replication
• massive data explosion: „Big Data“
• Globally distributed systems
• Specialized requirements
➡ more specialized databases
➡ Relational databases are no longer the only option
10. Why we ♥
CAP Theorem
It is impossible for a distributed computer system
to simultaneously provide all three of the following
guarantees:
• Consistency
(every read receives the most recent write or an
error)
• Availability
(every request receives a response, without
guarantee that it contains the most recent
version of the information)
• Partition tolerance
(the system continues to operate despite
arbitrary partitioning due to network failures)
(Eric Brewer)
PICK TWO
11. Why we ♥
My personal hunt for a multi-purpose
NoSQL database …
• Should fit most mid-size projects
• Document store (+ graphs)
• Arbitary query options
• Cross-table/collection relationships
• (optional) transactional integrity (ACID)
across multiple documents and operations
• replication/clustering
12. Why we ♥
My personal hunt…
…and various others
13. Why we ♥
My personal hunt…
…and various others
15. Why we ♥
The high-end $$$$ solution
• the most professional, feature-complete,
feature-rich NoSQL database ever
• document (XML/JSON) store and graph database
• focus on data integration and data consolidation
• expensive but worth the money if you need the features
• widely used in enterprises
(saved „Obama-Care“ project)
16. Why we ♥
A native multi-model database
• Document store (JSON)
• JOINs, secondary indexes, ACID transactions
• Key-value store
• Graph database
• integrates with document store
• rich graph query operations
• nodes and edges can contain complex data
➡ all models can be combined
17. Why we ♥
Foxx framework
• implement your own REST micro-services directly with
Javascript running inside ArangoDB
• unified data storage logic (decouples API from external services)
• reduced network overhead (no network latencies)
• you can use the full JS Stack
• batteries included
• build-in job queue
18. Why we ♥
AQL - One Query Language to rule them all
• AQL = Arango Query Language
• declarative, human-readable DSL (I hate JSON queries)
• document queries, graph queries, joins, all combined in
one statement
• ACID support with multi-collection transactions
• easy to understand with some SQL background
32. Why we ♥
Misc
• current version 3.0, (3.1 RC3)
• good documentation
• regular updates and fixes
• nicely supported
• supported by ArangoDB GmbH in Cologne
33. Why we ♥
• Community Edition (Apache License 2.0)
• Enterprise Edition (SLA, support options,
smart graphs, auditing, better security control)
https://www.arangodb.com/why-arangodb/references/https://www.arangodb.com/arangodb-drivers/