As a search consultant I need to understand how a search application is used with the end goal of providing a better search experience for the end user. That story can come from many places and part of that story can be found in the query logs.
Blog post about the same topic: http://nhhagen.wordpress.com/2013/11/28/query-log-analysis-using-logstash-elasticsearch-and-kibana/
This document provides an overview of Elasticsearch, including what it is, how it works, and how to perform basic operations like indexing, updating, and searching documents. It explains that Elasticsearch allows for advanced search across large amounts of data by making documents searchable and scaling easily. It also demonstrates how to index, update, search for, and retrieve documents through RESTful API calls. Faceted search, aggregations, and cluster architecture are also summarized.
This document provides examples of using aggregations in Elasticsearch to calculate statistics and group documents. It shows terms, range, and histogram facets/aggregations to group documents by fields like state or population range and calculate statistics like average density. It also demonstrates nesting aggregations to first group by one field like state and then further group and calculate stats within each state group. Finally it lists the built-in aggregation bucketizers and calculators available in Elasticsearch.
Tapping into Scientific Data with Hadoop and FlinkMichael Häusler
At ResearchGate, we constantly analyze scientific data to connect the world of science and make research open to all. It can be tricky to set up a process to continuously deliver improved versions of algorithms that tap into more than 100 million publications and corresponding bibliographic metadata. In this talk, we illustrate some (big) data engineering challenges of running data pipelines and incorporating results into the live databases that power our user-facing features every day. We show how Apache Flink helps us to improve performance, robustness, ease of maintenance - and most importantly - have more fun while building big data pipelines.
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
The core Search frameworks in Liferay 7 have been significantly retooled to benefit not only from Liferay's new modular architecture, but also from one of the most innovative players in the market: Elasticsearch, which replaces Lucene as the default search engine in Portal. This session will cover topics like clustering and scalability, unveil improvements (both Elasticsearch and Solr) like aggregations, filters, geolocation, "more like this" and other new query types, and also hot new features for the Enterprise like out-of-the-box Marvel cluster monitoring and Shield security.
André "Arbo" Oliveira joined Liferay in early 2014 as a senior engineer and leads the Search Infrastructure team. He's been writing code for a living for 22 years, 14 of them as a Java developer and architect. Ever since discovering Elasticsearch, he's vowed never to write another SQL WHERE clause again.
The document discusses MongoDB and Hadoop. It provides an overview of how MongoDB and Hadoop can be used together, including use cases in commerce, insurance and fraud detection. It describes the MongoDB Connector for Hadoop, which allows reading and writing to MongoDB from Hadoop tools like MapReduce, Pig and Hive. The document concludes with a demo of a movie recommendation platform that uses both MongoDB and Spark on Hadoop to power a movie browsing web application and generate recommendations.
MongoDB and Hadoop: Driving Business InsightsMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
Learn how you can enjoy the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data. This presentation will illustrates how MongoDB can easily and quickly store variable data formats, like top and depth of book, multiple asset classes, and even news and social networking feeds. It will explore aggregating and analyzing tick data in real-time for automated trading or in batch for research and analysis and how auto-sharding enables MongoDB to scale with commodity hardware to satisfy unlimited storage and performance requirements.
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
Learn why MongoDB is spreading like wildfire across capital markets (and really every industry) and then focus in particular on how financial firms are enjoying the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data.
This document proposes a log management solution using Logstash, Elasticsearch, and Kibana. Logstash is used to collect, parse, and index logs into Elasticsearch for centralized storage and real-time search. Kibana provides visualization and analytics dashboards. The solution offers scalability, reliability, searchability, and a low-cost and flexible open source approach to solving the challenges of gathering, analyzing, and gaining insights from large volumes of log data from diverse sources.
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...MongoDB
Mass spectrometry is the gold standard for determining chemical compositions, with spectrometers often measuring the mass of a compound down to a single electron. This level of granularity produces an enormous amount of hierarchical data that doesn't fit well into rows and columns. In this talk, learn how Thermo Fisher is using MongoDB Atlas on AWS to allow their users to get near real-time insights from mass spectrometry experiments—a process that used to take days. We also share how the underlying database service used by Thermo Fisher was built on AWS.
Analyze and visualize non-relational data with DocumentDB + Power BISriram Hariharan
The session will show how to do Analyze and visualize non-relational data with DocumentDB + Power BI. We are in the midst of a paradigm shift on how we store and analyze data. Unstructured or flexible schema data represents a large portion of data within an organization. Everyone is obsessed to turn this data into meaningful business information. Unstructured data analytics do not need to be time consuming and complex. Come learn how to analyze and visualize unstructured data in DocumentDB.
Understanding N1QL Optimizer to Tune QueriesKeshav Murthy
Every flight has a flight plan. Every query has a query plan. You must have seen its text form, called EXPLAIN PLAN. Query optimizer is responsible for creating this query plan for every query, and it tries to create an optimal plan for every query. In Couchbase, the query optimizer has to choose the most optimal index for the query, decide on the predicates to push down to index scans, create appropriate spans (scan ranges) for each index, understand the sort (ORDER BY) and pagination (OFFSET, LIMIT) requirements, and create the plan accordingly. When you think there is a better plan, you can hint the optimizer with USE INDEX. This talk will teach you how the optimizer selects the indices, index scan methods, and joins. It will teach you the analysis of the optimizer behavior using EXPLAIN plan and how to change the choices optimizer makes.
Modern architectures are moving away from a "one size fits all" approach. We are well aware that we need to use the best tools for the job. Given the large selection of options available today, chances are that you will end up managing data in MongoDB for your operational workload and with Spark for your high speed data processing needs.
Description: When we model documents or data structures there are some key aspects that need to be examined not only for functional and architectural purposes but also to take into consideration the distribution of data nodes, streaming capabilities, aggregation and queryability options and how we can integrate the different data processing software, like Spark, that can benefit from subtle but substantial model changes. A clear example is when embedding or referencing documents and their implications on high speed processing.
Over the course of this talk we will detail the benefits of a good document model for the operational workload. As well as what type of transformations we should incorporate in our document model to adjust for the high speed processing capabilities of Spark.
We will look into the different options that we have to connect these two different systems, how to model according to different workloads, what kind of operators we need to be aware of for top performance and what kind of design and architectures we should put in place to make sure that all of these systems work well together.
Over the course of the talk we will showcase different libraries that enable the integration between spark and MongoDB, such as MongoDB Hadoop Connector, Stratio Connector and MongoDB Spark Native Connector.
By the end of the talk I expect the attendees to have an understanding of:
How they connect their MongoDB clusters with Spark
Which use cases show a net benefit for connecting these two systems
What kind of architecture design should be considered for making the most of Spark + MongoDB
How documents can be modeled for better performance and operational process, while processing these data sets stored in MongoDB.
The talk is suitable for:
Developers that want to understand how to leverage Spark
Architects that want to integrate their existing MongoDB cluster and have real time high speed processing needs
Data scientists that know about Spark, are playing with Spark and want to integrate with MongoDB for their persistency layer
In search of: A meetup about Liferay and Search 2016-04-20Tibor Lipusz
Presentation for meetup http://www.meetup.com/Liferay-Budapest-Tech-Meetup/events/229996198/
Liferay Hungary Kft., Budapest, 2016-04-20
Extensible RESTful Applications with Apache TinkerPopVarun Ganesh
This document discusses building a graph database and domain-specific language (DSL) for analyzing Slack data. It defines entities like messages, users, and channels as graph nodes and their relationships as edges. A REST API is created to ingest and query the graph using TinkerPop and remote traversals. Custom traversal sources and classes define shorthand traversals and business logic to build the DSL, adding structure and meaning to queries over the Slack data graph.
Quick introduction to the click-through filterpontneo
Click-through filter is a relatively simple, well-constrained and flexible method for improving query returns using clickstream data. This presentation gives a brief overview of what it does, including some evidence of it's effectiveness.
Method detail described at http://www.slideshare.net/pontneo/click-through-filter
Analysis & demo (MySQL implementation using medical clickstream data) at http://www.slideshare.net/pontneo/click-through-filterprototyperesultsv2
Solr implementation described at http://www.slideshare.net/pontneo/better-search-implementation-of-click-through-filter-as-a-query-parser-plugin-for-apache-solr-lucene
If you're interested in testing the filter on your site or clickstream data, feel free to contact me or leave a comment.
Better Search: Click-through filter plugin – a flexible tool for improving se...pontneo
The document describes a click-through filter plugin for Apache Solr/Lucene that improves search results using user clickstream data. The plugin (1) adjusts result sorting based on item click popularity, (2) extends results to include related items connected by user clicks, and (3) allows customizing relevancy based on click traffic. It provides a flexible framework for query parsing with many configurable parameters to tailor filtering. The plugin aims to generate intelligent, dynamic search outputs that reflect changing user interests and collective knowledge based on actual user behavior.
The document discusses using MongoDB as a tick store for financial data. It provides an overview of MongoDB and its benefits for handling tick data, including its flexible data model, rich querying capabilities, native aggregation framework, ability to do pre-aggregation for continuous data snapshots, language drivers and Hadoop connector. It also presents a case study of AHL, a quantitative hedge fund, using MongoDB and Python as their market data platform to easily onboard large volumes of financial data in different formats and provide low-latency access for backtesting and research applications.
Introduction to Elasticsearch for Business Intelligence and Application InsightsData Works MD
Video of the presentation is available here: https://youtu.be/L6EMnvALYtU
Talk: Elasticsearch for Business Intelligence and Application Insights
Speaker: Sean Donnelly
Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. In this talk, I’ll discuss the fundamentals of storage and retrieval in Elasticsearch, why we decided to use it for search in our applications, and how you can also leverage it for both business intelligence and application insights.
Natixis Open Day 2018 presentation about Elasticsearch:
- Elasticsearch is a distributed, RESTful search and analytics engine for indexing and searching JSON documents.
- It allows for distributed logging, document indexing, inexact searches, and custom relevance scoring.
- Documents are organized into indexes, types, and shards for distributed querying and storage.
- Documents can be created, updated, and deleted via REST API calls. Relevance can be customized through boosting, functions, and other scoring methods.
- Kibana provides visualization and analytics capabilities for Elasticsearch data. Logstash and Beats facilitate data collection and shipping.
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
This document discusses using Elasticsearch for social media analytics and provides examples of common tasks. It introduces Elasticsearch basics like installation, indexing documents, and searching. It also covers more advanced topics like mapping types, facets for aggregations, analyzers, nested and parent/child relations between documents. The document concludes with recommendations on data design, suggesting indexing strategies for different use cases like per user, single index, or partitioning by time range.
Enhancement of Searching and Analyzing the Document using Elastic SearchIRJET Journal
Elasticsearch is an open source search and analytics engine that allows users to search, analyze, and get insights from data in near real-time. The document discusses how Elasticsearch uses inverted indexes and tokenization to enable fast searching of structured documents. It provides examples of indexing employee documents, performing simple and advanced searches on the data, and exploring techniques like stopword filtering that help optimize search performance. Elasticsearch allows users to scale their search capabilities horizontally across many servers to handle large volumes of data.
This document discusses Elasticsearch and provides examples of its real-world uses and basic functionality. It contains:
1) An overview of Elasticsearch and how it can be used for full-text search, analytics, and structured querying of large datasets. Dell and The Guardian are discussed as real-world use cases.
2) Explanations of basic Elasticsearch concepts like indexes, types, mappings, and inverted indexes. Examples of indexing, updating, and deleting documents.
3) Details on searching and filtering documents through queries, filters, aggregations, and aliases. Query DSL and examples of common queries like term, match, range are provided.
4) A discussion of potential data modeling designs for indexing user
This document summarizes how Elasticsearch can be used for scaling analytics applications. Elasticsearch is an open source, distributed search and analytics engine that can index large volumes of data. It automatically shards and replicates data across nodes for redundancy and high availability. Analytics queries like date histograms, statistical facets, and geospatial searches can retrieve insightful results from large datasets very quickly. The document provides an example of using Elasticsearch to perform sentiment analysis, location tagging, and analytical queries on over 100 million social media documents.
Learn how you can enjoy the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data. This presentation will illustrates how MongoDB can easily and quickly store variable data formats, like top and depth of book, multiple asset classes, and even news and social networking feeds. It will explore aggregating and analyzing tick data in real-time for automated trading or in batch for research and analysis and how auto-sharding enables MongoDB to scale with commodity hardware to satisfy unlimited storage and performance requirements.
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
Learn why MongoDB is spreading like wildfire across capital markets (and really every industry) and then focus in particular on how financial firms are enjoying the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data.
This document proposes a log management solution using Logstash, Elasticsearch, and Kibana. Logstash is used to collect, parse, and index logs into Elasticsearch for centralized storage and real-time search. Kibana provides visualization and analytics dashboards. The solution offers scalability, reliability, searchability, and a low-cost and flexible open source approach to solving the challenges of gathering, analyzing, and gaining insights from large volumes of log data from diverse sources.
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...MongoDB
Mass spectrometry is the gold standard for determining chemical compositions, with spectrometers often measuring the mass of a compound down to a single electron. This level of granularity produces an enormous amount of hierarchical data that doesn't fit well into rows and columns. In this talk, learn how Thermo Fisher is using MongoDB Atlas on AWS to allow their users to get near real-time insights from mass spectrometry experiments—a process that used to take days. We also share how the underlying database service used by Thermo Fisher was built on AWS.
Analyze and visualize non-relational data with DocumentDB + Power BISriram Hariharan
The session will show how to do Analyze and visualize non-relational data with DocumentDB + Power BI. We are in the midst of a paradigm shift on how we store and analyze data. Unstructured or flexible schema data represents a large portion of data within an organization. Everyone is obsessed to turn this data into meaningful business information. Unstructured data analytics do not need to be time consuming and complex. Come learn how to analyze and visualize unstructured data in DocumentDB.
Understanding N1QL Optimizer to Tune QueriesKeshav Murthy
Every flight has a flight plan. Every query has a query plan. You must have seen its text form, called EXPLAIN PLAN. Query optimizer is responsible for creating this query plan for every query, and it tries to create an optimal plan for every query. In Couchbase, the query optimizer has to choose the most optimal index for the query, decide on the predicates to push down to index scans, create appropriate spans (scan ranges) for each index, understand the sort (ORDER BY) and pagination (OFFSET, LIMIT) requirements, and create the plan accordingly. When you think there is a better plan, you can hint the optimizer with USE INDEX. This talk will teach you how the optimizer selects the indices, index scan methods, and joins. It will teach you the analysis of the optimizer behavior using EXPLAIN plan and how to change the choices optimizer makes.
Modern architectures are moving away from a "one size fits all" approach. We are well aware that we need to use the best tools for the job. Given the large selection of options available today, chances are that you will end up managing data in MongoDB for your operational workload and with Spark for your high speed data processing needs.
Description: When we model documents or data structures there are some key aspects that need to be examined not only for functional and architectural purposes but also to take into consideration the distribution of data nodes, streaming capabilities, aggregation and queryability options and how we can integrate the different data processing software, like Spark, that can benefit from subtle but substantial model changes. A clear example is when embedding or referencing documents and their implications on high speed processing.
Over the course of this talk we will detail the benefits of a good document model for the operational workload. As well as what type of transformations we should incorporate in our document model to adjust for the high speed processing capabilities of Spark.
We will look into the different options that we have to connect these two different systems, how to model according to different workloads, what kind of operators we need to be aware of for top performance and what kind of design and architectures we should put in place to make sure that all of these systems work well together.
Over the course of the talk we will showcase different libraries that enable the integration between spark and MongoDB, such as MongoDB Hadoop Connector, Stratio Connector and MongoDB Spark Native Connector.
By the end of the talk I expect the attendees to have an understanding of:
How they connect their MongoDB clusters with Spark
Which use cases show a net benefit for connecting these two systems
What kind of architecture design should be considered for making the most of Spark + MongoDB
How documents can be modeled for better performance and operational process, while processing these data sets stored in MongoDB.
The talk is suitable for:
Developers that want to understand how to leverage Spark
Architects that want to integrate their existing MongoDB cluster and have real time high speed processing needs
Data scientists that know about Spark, are playing with Spark and want to integrate with MongoDB for their persistency layer
In search of: A meetup about Liferay and Search 2016-04-20Tibor Lipusz
Presentation for meetup http://www.meetup.com/Liferay-Budapest-Tech-Meetup/events/229996198/
Liferay Hungary Kft., Budapest, 2016-04-20
Extensible RESTful Applications with Apache TinkerPopVarun Ganesh
This document discusses building a graph database and domain-specific language (DSL) for analyzing Slack data. It defines entities like messages, users, and channels as graph nodes and their relationships as edges. A REST API is created to ingest and query the graph using TinkerPop and remote traversals. Custom traversal sources and classes define shorthand traversals and business logic to build the DSL, adding structure and meaning to queries over the Slack data graph.
Quick introduction to the click-through filterpontneo
Click-through filter is a relatively simple, well-constrained and flexible method for improving query returns using clickstream data. This presentation gives a brief overview of what it does, including some evidence of it's effectiveness.
Method detail described at http://www.slideshare.net/pontneo/click-through-filter
Analysis & demo (MySQL implementation using medical clickstream data) at http://www.slideshare.net/pontneo/click-through-filterprototyperesultsv2
Solr implementation described at http://www.slideshare.net/pontneo/better-search-implementation-of-click-through-filter-as-a-query-parser-plugin-for-apache-solr-lucene
If you're interested in testing the filter on your site or clickstream data, feel free to contact me or leave a comment.
Better Search: Click-through filter plugin – a flexible tool for improving se...pontneo
The document describes a click-through filter plugin for Apache Solr/Lucene that improves search results using user clickstream data. The plugin (1) adjusts result sorting based on item click popularity, (2) extends results to include related items connected by user clicks, and (3) allows customizing relevancy based on click traffic. It provides a flexible framework for query parsing with many configurable parameters to tailor filtering. The plugin aims to generate intelligent, dynamic search outputs that reflect changing user interests and collective knowledge based on actual user behavior.
The document discusses using MongoDB as a tick store for financial data. It provides an overview of MongoDB and its benefits for handling tick data, including its flexible data model, rich querying capabilities, native aggregation framework, ability to do pre-aggregation for continuous data snapshots, language drivers and Hadoop connector. It also presents a case study of AHL, a quantitative hedge fund, using MongoDB and Python as their market data platform to easily onboard large volumes of financial data in different formats and provide low-latency access for backtesting and research applications.
Introduction to Elasticsearch for Business Intelligence and Application InsightsData Works MD
Video of the presentation is available here: https://youtu.be/L6EMnvALYtU
Talk: Elasticsearch for Business Intelligence and Application Insights
Speaker: Sean Donnelly
Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. In this talk, I’ll discuss the fundamentals of storage and retrieval in Elasticsearch, why we decided to use it for search in our applications, and how you can also leverage it for both business intelligence and application insights.
Natixis Open Day 2018 presentation about Elasticsearch:
- Elasticsearch is a distributed, RESTful search and analytics engine for indexing and searching JSON documents.
- It allows for distributed logging, document indexing, inexact searches, and custom relevance scoring.
- Documents are organized into indexes, types, and shards for distributed querying and storage.
- Documents can be created, updated, and deleted via REST API calls. Relevance can be customized through boosting, functions, and other scoring methods.
- Kibana provides visualization and analytics capabilities for Elasticsearch data. Logstash and Beats facilitate data collection and shipping.
ElasticSearch introduction talk. Overview of the API, functionality, use cases. What can be achieved, how to scale? What is Kibana, how it can benefit your business.
This document discusses using Elasticsearch for social media analytics and provides examples of common tasks. It introduces Elasticsearch basics like installation, indexing documents, and searching. It also covers more advanced topics like mapping types, facets for aggregations, analyzers, nested and parent/child relations between documents. The document concludes with recommendations on data design, suggesting indexing strategies for different use cases like per user, single index, or partitioning by time range.
Enhancement of Searching and Analyzing the Document using Elastic SearchIRJET Journal
Elasticsearch is an open source search and analytics engine that allows users to search, analyze, and get insights from data in near real-time. The document discusses how Elasticsearch uses inverted indexes and tokenization to enable fast searching of structured documents. It provides examples of indexing employee documents, performing simple and advanced searches on the data, and exploring techniques like stopword filtering that help optimize search performance. Elasticsearch allows users to scale their search capabilities horizontally across many servers to handle large volumes of data.
This document discusses Elasticsearch and provides examples of its real-world uses and basic functionality. It contains:
1) An overview of Elasticsearch and how it can be used for full-text search, analytics, and structured querying of large datasets. Dell and The Guardian are discussed as real-world use cases.
2) Explanations of basic Elasticsearch concepts like indexes, types, mappings, and inverted indexes. Examples of indexing, updating, and deleting documents.
3) Details on searching and filtering documents through queries, filters, aggregations, and aliases. Query DSL and examples of common queries like term, match, range are provided.
4) A discussion of potential data modeling designs for indexing user
This document summarizes how Elasticsearch can be used for scaling analytics applications. Elasticsearch is an open source, distributed search and analytics engine that can index large volumes of data. It automatically shards and replicates data across nodes for redundancy and high availability. Analytics queries like date histograms, statistical facets, and geospatial searches can retrieve insightful results from large datasets very quickly. The document provides an example of using Elasticsearch to perform sentiment analysis, location tagging, and analytical queries on over 100 million social media documents.
Elasticsearch is presented as an expert in real-time search, aggregation, and analytics. The document outlines Elasticsearch concepts like indexing, mapping, analysis, and the query DSL. Examples are provided for real-time search queries, aggregations including terms, date histograms, and geo distance. Lessons learned from using Elasticsearch at LARC are also discussed.
Looking at Content Recommendations through a Search Lens - Extended VersionSonya Liberman
Sonya Liberman leads the Personalization team @ Outbrain's Recommendations group, developing large-scale machine learning algorithms for Outbrain's content recommendations platform serving tens of billions real-time recommendations a day. She specializes in Information Retrieval, Machine Learning, and Computational Linguistics. Before joining Outbrain, she led the Research and Algorithms @ ConvertMedia (acquired by Taboola). She holds an MSc in Computer Science and a BSc in Computer Science and Computational Biology.
This invited talk was given at the Recommender Systems Workshop 2017, University of Haifa.
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...OpenSource Connections
Recently Elasticsearch has introduced a number of ways to improve search relevance of your documents based on numeric features. In this talk I will present the newly introduced field types of "rank_feature", "rank_features" ,"dense_field", and "sparse_vector" and discuss in what situations and how they can be used to boost scores of your documents. I will also talk about the inner workings of queries based on these fields, and related performance considerations.
Elasticsearch a real-time distributed search and analytics enginegautam kumar
Elasticsearch is a distributed, real-time search and analytics engine. It allows users to search, analyze, and combine full-text and structured data in a single system. Elasticsearch can index and search all fields and is accessed via a RESTful API. Popular users include Wikipedia, GitHub, and Stack Overflow. The document then discusses Elasticsearch terminology, installation, indexing and retrieving data, mappings and analysis, exact values versus full text search, advanced search capabilities, and index management.
This document discusses tuning MongoDB performance. It covers tuning queries using the database profiler and explain commands to analyze slow queries. It also covers tuning system configurations like Linux settings, disk I/O, and memory to optimize MongoDB performance. Topics include setting ulimits, IO scheduler, filesystem options, and more. References to MongoDB and Linux tuning documentation are also provided.
Building a Scalable Inbox System with MongoDB and Javaantoinegirbal
Many user-facing applications present some kind of news feed/inbox system. You can think of Facebook, Twitter, or Gmail as different types of inboxes where the user can see data of interest, sorted by time, popularity, or other parameter. A scalable inbox is a difficult problem to solve: for millions of users, varied data from many sources must be sorted and presented within milliseconds. Different strategies can be used: scatter-gather, fan-out writes, and so on. This session presents an actual application developed by 10gen in Java, using MongoDB. This application is open source and is intended to show the reference implementation of several strategies to tackle this common challenge. The presentation also introduces many MongoDB concepts.
"ElasticSearch in action" by Thijs Feryn.
ElasticSearch is a really powerful search engine, NoSQL database & analytics engine. It is fast, it scales and it's a child of the Cloud/BigData generation. This talk will show you how to get things done using ElasticSearch. The focus is on doing actual work, creating actual queries and achieving actual results. Topics that will be covered: - Filters and queries - Cluster, shard and index management - Data mapping - Analyzers and tokenizers - Aggregations - ElasticSearch as part of the ELK stack - Integration in your code.
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB
Proximus is one of the biggest Telecom companies in the Belgian market. This year the company began developing a new IoT network using LoRaWan technology. The talk will detail our development team’s search for a database suited to meet the needs of our IoT project, the selection and implementation of MongoDB as a database, as well as well as how we built a system for storing a variety of sensor data with high throughput by leveraging sleepy.mongoose. The talk will also discuss how different decisions around data storage impact applications in regards to both performance and total cost.
Tracking and visualizing COVID-19 with Elastic stackAnna Ossowski
This document summarizes Julie Zhong's presentation on tracking and visualizing COVID-19 data with the Elastic Stack. The presentation covered:
1. Using Elastic Cloud Enterprise to deploy and manage Elasticsearch clusters for indexing COVID-19 case data.
2. Indexing and ingesting COVID-19 case data from various sources into Elasticsearch using Logstash, the Kibana machine learning app, and APIs.
3. Performing searches, aggregations, and visualizations on the COVID-19 data in Kibana to analyze and discover trends in the data over time.
4. Creating dashboards in Kibana to visualize metrics and aggregations of the COVID-19 data.
Introduction to Apache Drill - interactive query and analysis at scaleMapR Technologies
This document introduces Apache Drill, an open source interactive analysis engine for big data. It was inspired by Google's Dremel and supports standard SQL queries over various data sources like Hadoop and NoSQL databases. Drill provides low-latency interactive queries at scale through its distributed, schema-optional architecture and support for nested data formats. The talk outlines Drill's capabilities and status as a community-driven project under active development.
SHARE is building a free, open dataset about the entire research lifecycle. It uses the Open Science Framework (OSF) to collect and store this data. The presentation demonstrates SHARE's search API, which allows querying the dataset using Elasticsearch queries. An example shows aggregating the top tags used in the dataset. The results return the top tags and the number of documents associated with each tag, with "ecological" being the most common tag. SHARE is developing a Python library to make interacting with the search API easier by handling the JSON request/response. The library can convert the Elasticsearch response into a dataframe for further analysis or visualization of the results.
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
The document provides an agenda and introduction to Couchbase and N1QL. It discusses Couchbase architecture, data types, data manipulation statements, query operators like JOIN and UNNEST, indexing, and query execution flow in Couchbase. It compares SQL and N1QL, highlighting how N1QL extends SQL to query JSON data.
Eating Our Own Dog Food: How to be taken seriously when it comes to adding va...UXPA Boston
As user experience professionals, we've had a better-than-front-row seat when it comes to understanding the humans who try to use our products, services, and platforms. We've been on or in the field, researching users, gaining deep empathy and insights, and finding how to pull business and user needs together, for the happiest Venn diagram since "you got your chocolate in my peanut butter." We've gotten really good at this. When given some room and runway, we've turned journeys that were fraught with friction into seamless experiences for customers, clients, employees, patients, and so many other kinds of users. There's just one problem. Like the accountant, attorney, marketer, and more, we've been struck —mightily — by the curse of knowledge. We have our own jargon, which has become our seemingly secret internal UX code. We can talk in concepts with each other toward great results but, when we talk to our peers, stakeholders, leadership, and others, we forget to tailor our business and technology to their human needs. So they get lost, confused, and frustrated. In these cases, we're providing a terrible user experience. Eating Our Own Dog Food will give you a more objective way to view, talk about, and show the tremendous value that UX brings to the table, in a way that our users in this circumstance can understand it, be energized by it, and be sure to invite us to "the table."
Annual (33 years) study of the Israeli Enterprise / public IT market. Covering sections on Israeli Economy, IT trends 2026-28, several surveys (AI, CDOs, OCIO, CTO, staffing cyber, operations and infra) plus rankings of 760 vendors on 160 markets (market sizes and trends) and comparison of products according to support and market penetration.
Managing Geospatial Open Data Serverlessly [AWS Community Day CH 2025]Chris Bingham
At the AWS Community Day 2025 in Dietlikon I presented a journey through the technical successes, service issues, and open-source perils that have made up the paddelbuch.ch story. With the goal of a zero-ops, (nearly) zero-cost system, serverless was the apparent technology approach. However, this was not without its ups and downs!
AI stands for Artificial Intelligence.
It refers to the ability of a computer system or machine to perform tasks that usually require human intelligence, such as:
thinking,
learning from experience,
solving problems, and
making decisions.
Is Your QA Team Still Working in Silos? Here's What to Do.marketing943205
Often, QA teams find themselves working in silos: the mobile team focused solely on app functionality, the web team on their portal, and API testers on their endpoints, with limited visibility into how these pieces truly connect. This separation can lead to missed integration bugs that only surface in production, causing frustrating customer experiences like order errors or payment failures. It can also mean duplicated efforts, communication gaps, and a slower overall release cycle for those innovative F&B features everyone is waiting for.
If this sounds familiar, you're in the right place! The carousel below, "Is Your QA Team Still Working in Silos?", visually explores these common pitfalls and their impact on F&B quality. More importantly, it introduces a collaborative, unified approach with Qyrus, showing how an all-in-one testing platform can help you break down these barriers, test end-to-end workflows seamlessly, and become a champion for comprehensive quality in your F&B projects. Dive in to see how you can help deliver a five-star digital experience, every time!
MuleSoft RTF & Flex Gateway on AKS – Setup, Insights & Real-World TipsPatryk Bandurski
This presentation was delivered during the Warsaw MuleSoft Meetup in April 2025.
Paulina Uhman (PwC Polska) shared her hands-on experience running MuleSoft Runtime Fabric (RTF) and Flex Gateway on Azure Kubernetes Service (AKS).
The deck covers:
What happens after installation (pods, services, and artifacts demystified)
Shared responsibility model: MuleSoft vs Kubernetes
Real-world tips for configuring connectivity
Key Kubernetes commands for troubleshooting
Lessons learned from practical use cases
🎙️ Hosted by: Patryk Bandurski, MuleSoft Ambassador & Meetup Leader
💡 Presented by: Paulina Uhman, Integration Specialist @ PwC Polska
TrustArc Webinar: Cross-Border Data Transfers in 2025TrustArc
In 2025, cross-border data transfers are becoming harder to manage—not because there are no rules, the regulatory environment has become increasingly complex. Legal obligations vary by jurisdiction, and risk factors include national security, AI, and vendor exposure. Some of the examples of the recent developments that are reshaping how organizations must approach transfer governance:
- The U.S. DOJ’s new rule restricts the outbound transfer of sensitive personal data to foreign adversaries countries of concern, introducing national security-based exposure that privacy teams must now assess.
- The EDPB confirmed that GDPR applies to AI model training — meaning any model trained on EU personal data, regardless of location, must meet lawful processing and cross-border transfer standards.
- Recent enforcement — such as a €290 million GDPR fine against Uber for unlawful transfers and a €30.5 million fine against Clearview AI for scraping biometric data signals growing regulatory intolerance for cross-border data misuse, especially when transparency and lawful basis are lacking.
- Gartner forecasts that by 2027, over 40% of AI-related privacy violations will result from unintended cross-border data exposure via GenAI tools.
Together, these developments reflect a new era of privacy risk: not just legal exposure—but operational fragility. Privacy programs must/can now defend transfers at the system, vendor, and use-case level—with documentation, certification, and proactive governance.
The session blends policy/regulatory events and risk framing with practical enablement, using these developments to explain how TrustArc’s Data Mapping & Risk Manager, Assessment Manager and Assurance Services help organizations build defensible, scalable cross-border data transfer programs.
This webinar is eligible for 1 CPE credit.
Introduction and Background:
Study Overview and Methodology: The study analyzes the IT market in Israel, covering over 160 markets and 760 companies/products/services. It includes vendor rankings, IT budgets, and trends from 2025-2029. Vendors participate in detailed briefings and surveys.
Vendor Listings: The presentation lists numerous vendors across various pages, detailing their names and services. These vendors are ranked based on their participation and market presence.
Market Insights and Trends: Key insights include IT market forecasts, economic factors affecting IT budgets, and the impact of AI on enterprise IT. The study highlights the importance of AI integration and the concept of creative destruction.
Agentic AI and Future Predictions: Agentic AI is expected to transform human-agent collaboration, with AI systems understanding context and orchestrating complex processes. Future predictions include AI's role in shopping and enterprise IT.
Pushing the Limits: CloudStack at 25K HostsShapeBlue
Boris Stoyanov took a look at a load testing exercise conducted in the lab. Discovered how CloudStack performs with 25,000 hosts as we explore response times, performance challenges, and the code improvements needed to scale effectively
--
The CloudStack European User Group 2025 took place on May 8th in Vienna, Austria. The event once again brought together open-source cloud professionals, contributors, developers, and users for a day of deep technical insights, knowledge sharing, and community connection.
Engaging interactive session at the Carolina TEC Conference—had a great time presenting the intersection of AI and hybrid cloud, and discussing the exciting momentum the #HashiCorp acquisition brings to #IBM."
A simple Introduction to Algorithmic FairnessPaolo Missier
Algorithmic bias and its effect on Machine Learning models.
Simple fairness metrics and how to achieve them by fixing either the data, the model, or both
NVIDIA’s Enterprise AI Factory and Blueprints_ Paving the Way for Smart, Scal...derrickjswork
In a landmark step toward making autonomous AI agents practical and production-ready for enterprises, NVIDIA has launched the Enterprise AI Factory validated design and a set of AI Blueprints. This initiative is a critical leap in transitioning generative AI from experimental projects to business-critical infrastructure.
Designed for CIOs, developers, and AI strategists alike, these new offerings provide the architectural backbone and application templates necessary to build AI agents that are scalable, secure, and capable of complex reasoning — all while being deeply integrated with enterprise systems.
Proposed Feature: Monitoring and Managing Cloud Usage Costs in Apache CloudStackShapeBlue
DIMSI showcased a proposed feature to help CloudStack users capitalize on cloud usage metrics out of the box. Gregoire Lamodiere and Joffrey Luangsaysana explored the need for improved visibility into cloud consumption metrics for both administrators and end users. They invited input and insights from the Apache CloudStack community regarding the proposal, fostering collaborative dialogue to refine the feature and ensure it meets the community's needs.
--
The CloudStack European User Group 2025 took place on May 8th in Vienna, Austria. The event once again brought together open-source cloud professionals, contributors, developers, and users for a day of deep technical insights, knowledge sharing, and community connection.
"AI in the browser: predicting user actions in real time with TensorflowJS", ...Fwdays
With AI becoming increasingly present in our everyday lives, the latest advancements in the field now make it easier than ever to integrate it into our software projects. In this session, we’ll explore how machine learning models can be embedded directly into front-end applications. We'll walk through practical examples, including running basic models such as linear regression and random forest classifiers, all within the browser environment.
Once we grasp the fundamentals of running ML models on the client side, we’ll dive into real-world use cases for web applications—ranging from real-time data classification and interpolation to object tracking in the browser. We'll also introduce a novel approach: dynamically optimizing web applications by predicting user behavior in real time using a machine learning model. This opens the door to smarter, more adaptive user experiences and can significantly improve both performance and engagement.
In addition to the technical insights, we’ll also touch on best practices, potential challenges, and the tools that make browser-based machine learning development more accessible. Whether you're a developer looking to experiment with ML or someone aiming to bring more intelligence into your web apps, this session will offer practical takeaways and inspiration for your next project.
#11: GraphsQuery load over time Average query latency over timeTop Queries Top Queries with 0 hits Top search modes top refiners/facets usedChanges in query load pct. last hour, last day, last 30dRefiner/facet usage over time
#12: GraphsQuery latency distribution count and pct Average query latency over timeQuery latency distribution count and pct over timeTable: queries over 200ms latency (really slow queries)
#14: A “typical” search applicationThe user sends a query to the search applicationThat query finds its way to the search application APIThe logic in the application is set up to do 3 parallel queries against the search engines for different types of data
#15: Query logs can be taken from the search engine, but are not the query log for the user, they are technical query logsThose logs can be analyzed to figure out when you need new servers, more RAM, CPU etc
#16: When you create the query log in the search application API, just before returning to the user, you can put the context of the query and “sum” into the query log
#18: For now only used to read the JSON log files and transport the log events to elasticsearch
#19: Indexes all the log events in per day indices
#20: Queries elasticsearch and builds graphs using facets. Provides an interactive application to generate graphs on the fly
#23: GraphsQuery load over time Average query latency over timeTop Queries Top Queries with 0 hits Top search modes top refiners/facets usedChanges in query load pct. last hour, last day, last 30dRefiner/facet usage over time
#24: GraphsQuery latency distribution count and pct Average query latency over timeQuery latency distribution count and pct over timeTable: queries over 200ms latency (really slow queries)