A talk that discusses two topics regarding Elasticsearch - multitenancy and scalability and what are the technical details to achieving them efficiently
Encryption is widely used by companies to secure sensitive data. It comes in different varieties and purposes. There's symmetric vs asymmetric encryption, there's encryption at rest, in transit and in use, there's TDE vs record-level encryption vs column/field level encryption, and then there's key-encryption (wrapping). All of these varieties serve different purposes and use-cases that we review - from the point of view of an infosec person, a sysadmin, a developer and an architect.
Elasticsearch is a distributed and highly available search engine that allows for multiple indexes and types within indexes. It provides RESTful and Java APIs to interface with the clusters as well as reliable asynchronous writing and real-time search capabilities. Elasticsearch is built on Lucene and is open source under the Apache 2 license.
DevCon Summit 2014 #DevelopersUnitePH: The "What" and "Why" of NoSQL by Matia...DEVCON
This document provides an overview of NoSQL databases by telling a story about a company that needed a new database solution to support growing user numbers, data size, and write throughput for a social media aggregation project. It explains that traditional SQL databases may not meet the needs of modern web applications that generate large amounts of structured and unstructured data very quickly. It then gives brief descriptions of different NoSQL database categories including key-value stores, document databases, BigTable databases, and search engines.
"TextMining with ElasticSearch", Saskia Vola, CEO at textminers.ioDataconomy Media
This document discusses using ElasticSearch for text mining tasks such as information extraction, sentiment analysis, keyword extraction, classification, and clustering. It describes how ElasticSearch can be used to perform linguistic preprocessing including tokenization, stopword removal, and stemming on text data. Additionally, it mentions plugins for language detection and clustering search results. The document provides an example of training a classification model using an index with content and category fields, and evaluating the model's performance on news text categorization.
The document describes a generic crawler that can crawl websites without APIs by using rules to extract data. It discusses the crawler's infrastructure, introduction to crawler rules using XPATH and CSS expressions, the crawl procedure of generating links, crawling based on links and saving data to a local DB, and limitations such as not working on AJAX sites. The goal is to build a multipurpose crawler powered by cloud computing that can extract information from various websites.
This document provides an overview of configuration options in Azure, including application settings, App Configuration, Key Vault, and Managed Identities for Azure Resources. It begins with an introduction to configuration and then discusses each option in more detail, providing demos of application settings, App Configuration, and Key Vault. The document emphasizes that these tools can help centralize and secure configuration across environments while simplifying administration.
This document provides an introduction to cryptography concepts including symmetric encryption, asymmetric encryption, hash functions, and common attacks on cryptographic systems. It begins with an introduction of the author and then defines cryptography as the practice of encryption and decryption. It explains the basic concepts of symmetric encryption using the same key for encryption and decryption, asymmetric encryption using public and private key pairs, and hash functions. It provides examples of implementations and uses of these cryptographic methods. Finally, it outlines some common attacks against symmetric, asymmetric cryptography and hash functions.
The document discusses using structured and semi-structured data together in modern applications. It describes how MariaDB Server handles both types of data through features like dynamic columns and JSON functions that allow storing and querying JSON documents. The presentation provides examples of defining, creating, reading, updating, and constraining both JSON documents and dynamic columns to bring the flexibility of semi-structured data to the reliability of relational databases.
Big Data has become the new buzzword like “Agile” and “Cloud”. Like those two others, it’s a transformative technology. We’ll be discussing:
•What is it?
•Technology key words
•HDFS
•Hadoop
•MapReduce
This will be part 1 of 2 (at least). This first talk will not be overly technical. We’ll go over the concepts and terms you’ll encounter when considering a big data solution.
Active Directory is a centralized directory service that stores objects like users, groups, computers, and policies. It provides security and simplifies administration. Groups contain users/computers and help apply policies. Group policies centrally manage settings. Organizational units logically organize objects and delegate administration. Trusts allow access between domains. From an attacker's perspective, they would get an initial foothold, enumerate privileged accounts and permissions, and exploit any misconfigurations to escalate privileges like taking over accounts. They could also use trusts to access other domains.
Securing data and preventing data breachesMariaDB plc
The document discusses various database security threats and best practices for securing MariaDB and MySQL deployments. It describes common attacks like SQL injection, denial of service attacks, and provides recommendations for defense including using database firewalls, limiting user privileges, encrypting sensitive data, and auditing. MariaDB MaxScale is highlighted as a way to implement features like connection pooling, query filtering, and data masking to enhance security.
This presentation is dedicated to Microsoft Azure. It contains an overview of the main trends in the development of Microsoft Azure, and the solutions that Microsoft offers with this product. There are also insights on its further development, and an impact it is going to bring to the market.
This presentation by Andriy Gnennyy (Senior Consultant, GlobalLogic Kharkiv) was delivered at GlobalLogic Kharkiv MS TechTalk on June 13, 2017.
Fast, powerful and scalable analytics can provide many business benefits including getting more value from data, faster decision making, cost reduction, and developing new products and services. There are four main types of analytics: descriptive (what happened), diagnostic (why did it happen), predictive (what will happen), and prescriptive (what action should be taken). MariaDB AX is a big data analytics solution that provides real-time analytics capabilities, built-in analytics functions, and easier management and scaling on commodity hardware at a lower cost than other solutions. It allows for both transactional and analytical processing using a single SQL interface.
The audience will get to learn how to use ElasticSearch efficiently and reliably, when data scales up in their applications. It will be about tuning your ElasticSearch and configuring ElasticSearch internal queues and buffers for heavy indexing. Another takeaway will be some insight to internals of ElasticSearch.
This document provides an introduction to FaunaDB, a next-generation cloud database. It discusses what FaunaDB is, why it was created, how it compares to traditional SQL databases, and provides a quick overview of the Fauna Query Language (FQL). Key points include that FaunaDB combines the simplicity of NoSQL with the ability to model complex relationships, provides ACID transactions, and supports various data modeling approaches including graph, relational, temporal and document structures.
This document summarizes an agenda for a presentation on advanced ColdBox REST techniques. The presentation covers tools for REST development, best practices for API design including resource naming, documentation, HTTP verbs, status codes, modularity, uniformity, performance through caching, security, and testing APIs through a BDD approach. The presenter is introduced as Luis Majano, founder of Ortus Solutions and creator of ColdBox and other Box frameworks.
This document contains a presentation on security in FaunaDB. It provides an overview of FaunaDB, compares it to traditional RDBMS, introduces FQL, and discusses security features including access control models, authentication using keys and tokens, JSON web tokens, and attribute-based access control. It also covers security resources like secrets, roles, and privileges and how to create custom roles and update roles in FaunaDB.
Internet of Things Cologne 2015: MongoDB Technical PresentationMongoDB
The value of the fast-growing class of NoSQL databases is the ability to handle high velocity volumes of data. And also enabling greater agility using dynamic schemas. MongoDB gives you those benefits while also providing a rich querying capability, and a document model for developer productivity. Learn about the reasons for MongoDB's popularity, and how you can leverage the core concepts of NoSQL to build robust and highly scalable IoT applications.
Test driving Azure Search and DocumentDBAndrew Siemer
This document provides an overview and comparison of DocumentDB and Azure Search. It discusses what NoSQL and search are, when each service is better to use, how to set up and structure data in each, and examples of querying. DocumentDB is described as a NoSQL database that uses a flexible JSON document structure and scales easily. Azure Search is an elastic search service that indexes and scores search results. The document provides examples of setting up databases and indexes, adding and querying data, and considerations for different field types and scoring profiles. It also discusses where each service may fit in different parts of an application architecture.
Building enterprise records management solutions for share point 2010Eric Shupps
This document summarizes options for building enterprise records management solutions in SharePoint 2010. It discusses what records management is, including the need to tightly control and audit records. It outlines two main options for handling records - in-place within existing sites or in a separate records center. It covers features like document IDs, content organizers, and document conversion that can help with records management. It also discusses holds and discovery for legal processes like eDiscovery. Finally, it briefly touches on retention policies.
Private keys and sensitive data should be stored securely and not in plaintext. Options for storage include encrypting the private key into a PKCS12 file, storing it in the user's keystore protected by access controls, executing cryptographic processes on a private network with secure protocols, using white box cryptography which obscures the key, or storing on external hardware devices.
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014NoSQLmatters
Sebastian Cohnen – Building a Startup with NoSQL
At StormForger we use several NoSQL systems to handle all kinds of different data. We have a lot of time series data based on the fact, that we do load testing and performance analysis of HTTP-based infrastructure and services. For time series data, we use InfluxDB. We also use several Redis instances for caching and storing structured data, that needs to be fast on read and write access. Lately we also started to integrate ArangoDB into our architecture, which is a perfect fit for storing and working with our complex test case definition data structures. In this talk I’d like to present how we build our startup on the foundation provided by several NoSQL databases, how we came to choose those systems and how we use them.
Descubre las características disponibles con demostraciones: la replicación entre clústeres, los índices bloqueados de Elasticsearch, los espacios de Kibana y los datos de integraciones en Beats y Logstash.
This document summarizes a presentation on getting started with SQLite. The presentation covers what SQLite is, who uses it, where it can be used, its key features, constraints and functions. It concludes with a live demo of installing SQLite, creating databases and tables, and performing basic operations using a database manager tool.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
The document discusses using structured and semi-structured data together in modern applications. It describes how MariaDB Server handles both types of data through features like dynamic columns and JSON functions that allow storing and querying JSON documents. The presentation provides examples of defining, creating, reading, updating, and constraining both JSON documents and dynamic columns to bring the flexibility of semi-structured data to the reliability of relational databases.
Big Data has become the new buzzword like “Agile” and “Cloud”. Like those two others, it’s a transformative technology. We’ll be discussing:
•What is it?
•Technology key words
•HDFS
•Hadoop
•MapReduce
This will be part 1 of 2 (at least). This first talk will not be overly technical. We’ll go over the concepts and terms you’ll encounter when considering a big data solution.
Active Directory is a centralized directory service that stores objects like users, groups, computers, and policies. It provides security and simplifies administration. Groups contain users/computers and help apply policies. Group policies centrally manage settings. Organizational units logically organize objects and delegate administration. Trusts allow access between domains. From an attacker's perspective, they would get an initial foothold, enumerate privileged accounts and permissions, and exploit any misconfigurations to escalate privileges like taking over accounts. They could also use trusts to access other domains.
Securing data and preventing data breachesMariaDB plc
The document discusses various database security threats and best practices for securing MariaDB and MySQL deployments. It describes common attacks like SQL injection, denial of service attacks, and provides recommendations for defense including using database firewalls, limiting user privileges, encrypting sensitive data, and auditing. MariaDB MaxScale is highlighted as a way to implement features like connection pooling, query filtering, and data masking to enhance security.
This presentation is dedicated to Microsoft Azure. It contains an overview of the main trends in the development of Microsoft Azure, and the solutions that Microsoft offers with this product. There are also insights on its further development, and an impact it is going to bring to the market.
This presentation by Andriy Gnennyy (Senior Consultant, GlobalLogic Kharkiv) was delivered at GlobalLogic Kharkiv MS TechTalk on June 13, 2017.
Fast, powerful and scalable analytics can provide many business benefits including getting more value from data, faster decision making, cost reduction, and developing new products and services. There are four main types of analytics: descriptive (what happened), diagnostic (why did it happen), predictive (what will happen), and prescriptive (what action should be taken). MariaDB AX is a big data analytics solution that provides real-time analytics capabilities, built-in analytics functions, and easier management and scaling on commodity hardware at a lower cost than other solutions. It allows for both transactional and analytical processing using a single SQL interface.
The audience will get to learn how to use ElasticSearch efficiently and reliably, when data scales up in their applications. It will be about tuning your ElasticSearch and configuring ElasticSearch internal queues and buffers for heavy indexing. Another takeaway will be some insight to internals of ElasticSearch.
This document provides an introduction to FaunaDB, a next-generation cloud database. It discusses what FaunaDB is, why it was created, how it compares to traditional SQL databases, and provides a quick overview of the Fauna Query Language (FQL). Key points include that FaunaDB combines the simplicity of NoSQL with the ability to model complex relationships, provides ACID transactions, and supports various data modeling approaches including graph, relational, temporal and document structures.
This document summarizes an agenda for a presentation on advanced ColdBox REST techniques. The presentation covers tools for REST development, best practices for API design including resource naming, documentation, HTTP verbs, status codes, modularity, uniformity, performance through caching, security, and testing APIs through a BDD approach. The presenter is introduced as Luis Majano, founder of Ortus Solutions and creator of ColdBox and other Box frameworks.
This document contains a presentation on security in FaunaDB. It provides an overview of FaunaDB, compares it to traditional RDBMS, introduces FQL, and discusses security features including access control models, authentication using keys and tokens, JSON web tokens, and attribute-based access control. It also covers security resources like secrets, roles, and privileges and how to create custom roles and update roles in FaunaDB.
Internet of Things Cologne 2015: MongoDB Technical PresentationMongoDB
The value of the fast-growing class of NoSQL databases is the ability to handle high velocity volumes of data. And also enabling greater agility using dynamic schemas. MongoDB gives you those benefits while also providing a rich querying capability, and a document model for developer productivity. Learn about the reasons for MongoDB's popularity, and how you can leverage the core concepts of NoSQL to build robust and highly scalable IoT applications.
Test driving Azure Search and DocumentDBAndrew Siemer
This document provides an overview and comparison of DocumentDB and Azure Search. It discusses what NoSQL and search are, when each service is better to use, how to set up and structure data in each, and examples of querying. DocumentDB is described as a NoSQL database that uses a flexible JSON document structure and scales easily. Azure Search is an elastic search service that indexes and scores search results. The document provides examples of setting up databases and indexes, adding and querying data, and considerations for different field types and scoring profiles. It also discusses where each service may fit in different parts of an application architecture.
Building enterprise records management solutions for share point 2010Eric Shupps
This document summarizes options for building enterprise records management solutions in SharePoint 2010. It discusses what records management is, including the need to tightly control and audit records. It outlines two main options for handling records - in-place within existing sites or in a separate records center. It covers features like document IDs, content organizers, and document conversion that can help with records management. It also discusses holds and discovery for legal processes like eDiscovery. Finally, it briefly touches on retention policies.
Private keys and sensitive data should be stored securely and not in plaintext. Options for storage include encrypting the private key into a PKCS12 file, storing it in the user's keystore protected by access controls, executing cryptographic processes on a private network with secure protocols, using white box cryptography which obscures the key, or storing on external hardware devices.
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014NoSQLmatters
Sebastian Cohnen – Building a Startup with NoSQL
At StormForger we use several NoSQL systems to handle all kinds of different data. We have a lot of time series data based on the fact, that we do load testing and performance analysis of HTTP-based infrastructure and services. For time series data, we use InfluxDB. We also use several Redis instances for caching and storing structured data, that needs to be fast on read and write access. Lately we also started to integrate ArangoDB into our architecture, which is a perfect fit for storing and working with our complex test case definition data structures. In this talk I’d like to present how we build our startup on the foundation provided by several NoSQL databases, how we came to choose those systems and how we use them.
Descubre las características disponibles con demostraciones: la replicación entre clústeres, los índices bloqueados de Elasticsearch, los espacios de Kibana y los datos de integraciones en Beats y Logstash.
This document summarizes a presentation on getting started with SQLite. The presentation covers what SQLite is, who uses it, where it can be used, its key features, constraints and functions. It concludes with a live demo of installing SQLite, creating databases and tables, and performing basic operations using a database manager tool.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
This document discusses using Elasticsearch, Azure, and Episerver together for search capabilities on the Evira website. Key points:
1) Elasticsearch provides global search and efficient querying of large datasets. Azure provides the cloud platform and Episerver is used for content editing and as the master data store.
2) Real-time indexing from Episerver events into Elasticsearch provides search results with 1-2 second latency.
3) CQRS pattern is used where commands update Episerver and queries are handled by Elasticsearch for better performance on large datasets.
Episerver Find is an event-driven search engine built on top of Elasticsearch that is well-suited for Episerver projects. It separates commands and queries using CQRS, with Episerver handling simple queries and Elasticsearch handling more complex queries for improved performance. Choosing the right tools like Episerver for content management, Elasticsearch for search, and a customizable cloud platform allows building a scalable solution for projects of any size.
Here are the slides for my talk "An intro to Azure Data Lake" at Techorama NL 2018. The session was held on Tuesday October 2nd from 15:00 - 16:00 in room 7.
This document summarizes a presentation about using Elasticsearch for analytics on customer communities. It discusses how Lithium uses Elasticsearch to analyze terabytes of daily social media data to understand customer participation and content that generates engagement. It also provides details about Lithium's large Elasticsearch cluster containing over 7 billion documents, and lessons learned around bulk loading, faceting, and settings for data centers.
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
Treasure Data provides a data analytics service with the following key components:
- Data is collected from various sources using Fluentd and loaded into PlazmaDB.
- PlazmaDB is the distributed time-series database that stores metadata and data.
- Jobs like queries, imports, and optimizations are executed on Hadoop and Presto clusters using queues, workers, and a scheduler.
- The console and APIs allow users to access the service and submit jobs for processing and analyzing their data.
This document provides an overview and summary of key concepts related to advanced databases. It discusses relational databases including MySQL, SQL, transactions, and ODBC. It also covers database topics like triggers, indexes, and NoSQL databases. Alternative database systems like graph databases, triplestores, and linked data are introduced. Web services, XML, and data journalism are also briefly summarized. The document provides definitions and examples of these technical database terms and concepts.
Colorado Springs Open Source Hadoop/MySQL David Smelker
This document discusses MySQL and Hadoop integration. It covers structured versus unstructured data and the capabilities and limitations of relational databases, NoSQL, and Hadoop. It also describes several tools for integrating MySQL and Hadoop, including Sqoop for data transfers, MySQL Applier for streaming changes to Hadoop, and MySQL NoSQL interfaces. The document outlines the typical life cycle of big data with MySQL playing a role in data acquisition, organization, analysis, and decisions.
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Continuent
Elasticsearch provides a quick and easy method to aggregate data, whether you want to use it for simplifying your search across multiple depots and databases, or as part of your analytics stack. Getting the data from your transactional engines into Elasticsearch is something that can be achieved within your application layer with all of the associated development and maintenance costs. Instead, offload the operation and simplify your deployment by using direct data replication to handle the insert, update and delete processes.
AGENDA
- Basic replication model
- How to concentrate data from multiple sources
- How the data is represented within Elasticsearch
- Customizations and configurations available to tailor the data format
- Filters and data modifications available
Module 2.2 Introduction to NoSQL Databases.pptxNiramayKolalle
This presentation explores NoSQL databases, a modern alternative to traditional relational database management systems (RDBMS). NoSQL databases are designed to handle large-scale data storage and high-speed processing with a focus on flexibility, scalability, and performance. Unlike SQL databases, NoSQL solutions do not rely on structured tables, schemas, or joins, making them ideal for handling Big Data applications and distributed systems.
Introduction to NoSQL Databases:
NoSQL databases are built on the following core principles:
Schema-Free Structure: No predefined table structures, allowing dynamic data storage.
Horizontal Scalability: Unlike SQL databases that scale vertically (by increasing hardware power), NoSQL databases support horizontal scaling, distributing data across multiple servers.
Distributed Computing: Data is stored across multiple nodes, preventing single points of failure and ensuring high availability.
Simple APIs: NoSQL databases often use simpler query mechanisms instead of complex SQL queries.
Optimized for Performance: NoSQL databases eliminate joins and support faster read/write operations.
Key Theoretical Concepts:
CAP Theorem (Brewer’s Theorem)
The CAP theorem states that a distributed system can provide only two out of three guarantees:
Consistency (C) – Ensures that all database nodes show the same data at any given time.
Availability (A) – Guarantees that every request receives a response.
Partition Tolerance (P) – The system continues to operate even if network failures occur.
Most NoSQL databases prioritize Availability and Partition Tolerance (AP) while relaxing strict consistency constraints, unlike SQL databases that focus on Consistency and Availability (CA).
BASE vs. ACID Model
SQL databases follow the ACID (Atomicity, Consistency, Isolation, Durability) model, ensuring strict transactional integrity. NoSQL databases use the BASE model (Basically Available, Soft-state, Eventually consistent), allowing flexibility in distributed environments where eventual consistency is preferred over immediate consistency.
Types of NoSQL Databases:
Key-Value Stores – Store data as simple key-value pairs, making them highly efficient for caching, session management, and real-time analytics.
Examples: Amazon DynamoDB, Redis, Riak
Column-Family Stores – Store data in columns rather than rows, optimizing analytical queries and batch processing workloads.
Examples: Apache Cassandra, HBase, Google Bigtable
Document Stores – Use JSON, BSON, or XML documents to represent data, making them ideal for content management systems, catalogs, and flexible data models.
Examples: MongoDB, CouchDB, ArangoDB
Graph Databases – Focus on relationships between data, allowing high-performance queries for connected data such as social networks, fraud detection, and recommendation engines.
Examples: Neo4j, Oracle NoSQL Graph, Amazon Neptune
Business Drivers for NoSQL Adoption:
Volume: The ability to process large datasets effic
Presto is an open source distributed SQL query engine that was originally developed by Facebook. It allows for fast SQL queries on large datasets across multiple data sources. Presto uses various optimizations like code generation, predicate pushdown, and data layout awareness to improve query performance. It is used at Facebook and other companies for interactive analytics, batch ETL, A/B testing, and app analytics where low latency and high concurrency are important.
An overview of various database technologies and their underlying mechanisms over time.
Presentation delivered at Alliander internally to inspire the use of and forster the interest in new (NOSQL) technologies. 18 September 2012
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
В обществения сектор (или иначе казано - държавата) се създава много софтуер, от който зависят милиони левове и са концентрирани големи интереси. Наивно е да се смята, че просто защото нещо е електронно, то корупцията е елиминирана. Напротив, ако не бъде направен както трябва, софтуерът може включително да създаде нов вид, "електронна" корупция. Именно затова трябва да приложим няколко не толкова прости принципа и техники, за да получим аниткорупционен софтуер.
The document discusses how network, endpoint, cloud, custom development, off-the-shelf, and security tool security are all difficult to implement effectively. It notes that attacks can come from many vectors, including supply chains, social engineering, and zero-day exploits. The document argues that while nothing is perfectly secure, organizations must work to manage risks through long-term policies, trained security personnel, standardization, vendor responsibility, and limiting zero-day vulnerabilities. The overarching message is that cybersecurity is a complex challenge due to the inherent insecurity of modern computing systems and interconnected networks.
Blockchain overview - types, use-cases, security and usabiltyBozhidar Bozhanov
This document provides an overview of blockchain technology, including its types, use cases, security, and usability. It discusses the key components of blockchain like hash chains, Merkle trees, and consensus models. It outlines the main types of blockchain solutions and some important features like immutability and decentralization. The main drawbacks of public blockchains are that they are expensive, volatile, and not scalable. The document also discusses security considerations and challenges with usability. It analyzes some proposed use cases for blockchain technology and their issues. In conclusion, it states that cryptography enables data integrity, and the right tool should be chosen for each job.
This document discusses scaling applications horizontally on AWS. It provides an overview of AWS services like EC2, load balancers, auto-scaling groups, and databases that can be used to build scalable architectures in the cloud. Key recommendations include making applications stateless, using distributed caching, executing jobs only once in a cluster, and automating infrastructure provisioning with tools like CloudFormation. The document also covers blue-green deployments and general best practices for security and availability.
GDPR for developers is a document about the General Data Protection Regulation (GDPR) and what software developers need to do to comply. It discusses key GDPR concepts like the rights of data subjects, lawful processing of personal data, and security measures. It provides practical advice for implementing GDPR principles in software like obtaining consent, handling data subject requests, and responding to data breaches. The overall message is that GDPR compliance requires changes to protect personal data but many can be done incrementally without a full rewrite.
1. Message queues allow for decoupling of systems, asynchronous processing, fault tolerance, and event-driven architectures. They come in different types including broker-based, brokerless, in-memory, and database-backed.
2. Common queue technologies include JMS, AMQP, Kafka, ZeroMQ, and in-memory options like Hazelcast. Exactly-once delivery, order guarantees, persistence, and transactions add complexity.
3. Simpler solutions may suffice depending on requirements, such as synchronous calls within an application, asynchronous processing without a queue, or using a database for queueing. The simplest reliable solution is preferred.
Electronic governance steps in the right direction?Bozhidar Bozhanov
This document discusses electronic governance initiatives in Bulgaria. It provides an overview of the country's efforts to modernize government services and move processes online. Key points include developing an electronic identification system for citizens, making government data openly available, integrating systems to allow data sharing between agencies, and piloting electronic voting with security and verifiability standards. The approach focuses on agile development, engaging experts and citizens, prioritizing open source solutions, and setting legal and technical standards to help transition government services and ensure long-term sustainability of these reforms.
Какви са проблемите със сигурността на електронното управление, и как би помогнал отворените код, електронната идентификация и общите правила за достъп
This document discusses Bulgaria's Electronic Governance Act which mandates that all new custom-built government software be open source and developed in a public repository. The goals are to increase transparency, quality, and cost-effectiveness of government software projects while reducing vendor lock-in. Potential issues that could arise include a lack of enforcement, companies developing software privately instead of publicly, and certain agencies claiming the law does not apply to them. Overall it is still too early to say whether the approach will be successful, but sharing experience with other countries' open source policies provides optimism.
This document discusses biometric identification and its uses and challenges. It describes how biometrics like fingerprints, iris scans, and DNA can be used to identify individuals but also how current biometric systems have security flaws. Centralized biometric databases are vulnerable if hacked and fingerprints can be fooled. The document proposes a future where biometric hashes combined with passwords provide secure, anonymous digital identities without centralized databases's risks.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including:
-Artificial Intelligence Market Overview
-Strategies for AI Adoption in 2025
-Anticipated drivers of AI adoption and transformative technologies
-Benefits of AI and Big data for your business
-Tips on how to prepare your business for innovation
-AI and data privacy: Strategies for securing data privacy in AI models, etc.
Download your free copy nowand implement the key findings to improve your business.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
Automation Hour 1/28/2022: Capture User Feedback from AnywhereLynda Kane
Slide Deck from Automation Hour 1/28/2022 presentation Capture User Feedback from Anywhere presenting setting up a Custom Object and Flow to collection User Feedback in Dynamic Pages and schedule a report to act on that feedback regularly.
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat
The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtLynda Kane
Slide Deck from Buckeye Dreamin' 2024 presentation Assessing and Resolving Technical Debt. Focused on identifying technical debt in Salesforce and working towards resolving it.
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
Learn the Basics of Agile Development: Your Step-by-Step GuideMarcel David
New to Agile? This step-by-step guide is your perfect starting point. "Learn the Basics of Agile Development" simplifies complex concepts, providing you with a clear understanding of how Agile can improve software development and project management. Discover the benefits of iterative work, team collaboration, and flexible planning.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
2. ABOUT ME
• Founder at LogSentinel, an information security startup
• LogSentinel SIEM – product that indexes billions of logs with Elasticsearch
• https://techblog.bozho.net
• https://twitter.com/bozhobg
3. SCALABILITY AND MULTITENANCY
• Scalability – how to process millions (billions) of documents on multiple machines
• Multitenancy – how to have our system support multiple users/organizations while
segregating their data
• One can exist without the other
• Both are architectural and implementation tasks, not (just) work for Ops.
• „We’ ll push the data in whatever form and Ops will take care of the scaling “
4. ELASTICSEARCH BSICS
• “You know, for search”
• Indexing documents (document = anything)
• Full-text search and keyword search
• Allows for large clusters
• Licensing issues
5. USE-CASE: TIME-SERIES DATA
• Indexing events (logs, metrics, etc.)
• Wide-spread and widely applicable scenario
• Documents almost always have a timestamp
8. LIMITING FACTORS
• One shard shouldn’t be to large
• Ideally between 10 and 50 GB; otherwise recovery after failure may not work
• The number of shards on a node is limited by RAM
• Lucene segments are append-only
• A large number of segments reduce performance
9. MULTITENANCY
• Cluster-per-tenant
• Heavy for administrations
• No real multitenancy
• Expensive
• Index-per-tenant
• Also heave for administration
• Doesn’t scale well
• Tenant-based routing
• Recommended in most cases
10. TENANT-BASED ROUTING
• _routing=<tenantId> or _routing=<tenantOwnedResourceId>
• E.g.. userId or dataSourceId
• Routing parameter designates which shard to be used for storing the document
• _routing for search requests tells Elasticsearch where to look for the data =>
faster search
• shard_num = hash(_routing) % num_primary_shards
• mappings._routing.required: true
11. STRUCTURE OF INDEXED DATA
• One field can have only one type
• The type is determined on index creation or on first indexed document with that
field
• User1 creates custom param “duration” of type String
• User2 wants to create “duration” of a numeric type -> error
• Solution: custom parameter hierarchies by type: params, numericParams,
dateParams, …
12. SCALABILITY
• „We add more machines and it’s good“?
• Recommended shard size (10-50 GB)
• We can’t change shards on a running index
• Lucene Segments are read-only:
• Deleting a document = bad
• Updating a document = bad
13. OPTIONS FOR STRUCTURING INDEXES
• We need a structure to allow indexing and searching in an arbitrarily large amount
of data
• One big, ever-growing index
• Convenient for small amounts of data, but faces all scalability problems
• Index-per-day / index-per-week / index-per-size
• Index-per-day-per-retention
• Rollover
• Deletion should be done by deleting whole indexes, not individual documents
14. MANY INDEXES FOR SEARCH, ONE FOR
INDEXING
• One search query can be directed to many indexes based on an index alias
• Supporting one (or several) active indexes for ingesting documents
• All other indexes– read-only
• This solves the problem with:
• Growing data and growing size of shards
• Deleting old data
15. EFFECTIVE INDEXING
• In real time (problem: too many requests to Elasticsearch)
• Storing in a database and indexing with a batch job
• Message queue (complex to implement) (we use Kafka)
• In-memory queue (might lose data)
• Batch-indexing when a given size or time threshold is reached
• Hybrid: bulk processing + database
• Quick indexing with in-memory queue + subsequent check based on the data in the database
• Avoid updates (=delete + insert)
16. CONCLUSION
• Elasticsearch is easy to get running
• …and complex for scaling
• Changes to a production setup are hard
• We must not throw scalability and multitenancy tasks to the Ops teams – they are
application problems
• Elasticsearch internals impose unintuitive limitations (“The law of leaky
abstractions”)