RESTFul API for news application using strapi headless cms Rakesh Falke
Strapi is a flexible, open-source Headless CMS that gives developers the freedom to choose their favorite tools and frameworks while also allowing editors to easily manage and distribute their content.
For documentation https://strapi.io/documentation/3.0.0-beta.x/getting-started/introduction.html
This document provides an overview of Entity Framework Code First, including its basic workflow, database initialization strategies, configuring domain classes using data annotations and fluent API, modeling relationships like one-to-one, one-to-many and many-to-many, and performing migrations using automated and code-based approaches. Code First allows writing classes first and generating the database, starting from EF 4.1, and supports domain-driven design principles.
The document provides an introduction to web APIs and REST. It defines APIs as methods to access data and workflows from an application without using the application itself. It describes REST as an architectural style for APIs that uses a client-server model with stateless operations and a uniform interface. The document outlines best practices for REST APIs, including using HTTP verbs like GET, POST, PUT and DELETE to perform CRUD operations on resources identified by URIs. It also discusses authentication, authorization, security concerns and gives examples of popular REST APIs from Facebook, Twitter and other services.
Presented by Nikola Vasilev on SkopjeTechMeetup 7.
Representational state transfer (REST) can be thought of as the language of the Internet. Now with cloud usage on the rise, REST is a logical choice for building APIs that allow end users to connect and interact with cloud services. This talk will deliver more insight into the challenges on building and maintaining good and clean RESTful APIs.
The document summarizes features and capabilities of Oracle Database including:
- Support for structured and unstructured data types including images, XML, and multimedia.
- Tools for managing growth of data and enabling innovation with different data types.
- Self-managing capabilities that help liberate DBAs from resource management tasks.
- Features for high performance, availability, security and compliance at lower costs.
Automation API testing becoming a crucial part of most of the project. This whitepaper provides an insight into how API automation with REST Assured is certainly the way forward in API testing.
Presentation on Strapi CMS. It provides a quick introduction, explaining the what, why and how of the Strapi capabilities. Presentation includes a working demo for setting up a Strapi instance, define content type and author content.
1) The document discusses information retrieval and search engines. It describes how search engines work by indexing documents, building inverted indexes, and allowing users to search indexed terms.
2) It then focuses on Elasticsearch, describing it as a distributed, open source search and analytics engine that allows for real-time search, analytics, and storage of schema-free JSON documents.
3) The key concepts of Elasticsearch include clusters, nodes, indexes, types, shards, and documents. Clusters hold the data and provide search capabilities across nodes.
This document provides an overview of developing applications using Oracle Application Express (APEX). It discusses the APEX architecture and components used for browser-based application development like the Application Builder, SQL Workshop, and Administrator. The benefits of APEX are also summarized like rapid development, mobile support, and use cases. Steps for creating a demo "help desk" application are outlined, including designing the database tables, loading sample data, and basic application navigation.
Elasticsearch is a free and open source distributed search and analytics engine. It allows documents to be indexed and searched quickly and at scale. Elasticsearch is built on Apache Lucene and uses RESTful APIs. Documents are stored in JSON format across distributed shards and replicas for fault tolerance and scalability. Elasticsearch is used by many large companies due to its ability to easily scale with data growth and handle advanced search functions.
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
MongoDB is a cross-platform document-oriented database system that is classified as a NoSQL database. It avoids the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas. MongoDB was first developed in 2007 and is now the most popular NoSQL database system. It uses collections rather than tables and documents rather than rows. Documents can contain nested objects and arrays. MongoDB supports querying, indexing, and more. Queries use JSON-like documents and operators to specify search conditions. Documents can be inserted, updated, and deleted using various update operators.
This is a presentation which describe the big picture of the Rest API. In this presentation I simply describe the theories with practical examples. Hope this presentation will cover the overall Rest API domain.
Learning To Rank has been the first integration of machine learning techniques with Apache Solr allowing you to improve the ranking of your search results using training data.
One limitation is that documents have to contain the keywords that the user typed in the search box in order to be retrieved(and then reranked). For example, the query “jaguar” won’t retrieve documents containing only the terms “panthera onca”. This is called the vocabulary mismatch problem.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s information need without necessarily containing those query terms; it learns the similarity of terms and sentences in your collection through deep neural networks and numerical vector representation(so no manual synonyms are needed!).
This talk explores the first Apache Solr official contribution about this topic, available from Apache Solr 9.0.
We start with an overview of neural search (Don’t worry - we keep it simple!): we describe vector representations for queries and documents, and how Approximate K-Nearest Neighbor (KNN) vector search works. We show how neural search can be used along with deep learning techniques (e.g, BERT) or directly on vector data, and how we implemented this feature in Apache Solr, giving usage examples!
Join us as we explore this new exciting Apache Solr feature and learn how you can leverage it to improve your search experience!
This document outlines the topics covered in an Edureka course on MongoDB. The course contains 8 modules that cover MongoDB fundamentals, CRUD operations, schema design, administration, scaling, indexing and aggregation, application integration, and additional concepts and case studies. Each module contains multiple topics that will be taught through online instructor-led classes, recordings, quizzes, assignments, and support.
Slides are created to demonstrate about ETL Testing, some one who want to start and learn ETL Tesing can make use of this ppt. It includes contents related all ETL Testing schema
This document provides an introduction and overview of REST APIs. It defines REST as an architectural style based on web standards like HTTP that defines resources that are accessed via common operations like GET, PUT, POST, and DELETE. It outlines best practices for REST API design, including using nouns in URIs, plural resource names, GET for retrieval only, HTTP status codes, and versioning. It also covers concepts like filtering, sorting, paging, and common queries.
Hot Topics: The DuraSpace Community Webinar Series,
“Introducing DSpace 7: Next Generation UI”
Curated by Claire Knowles, Library Digital Development Manager, The University of Edinburgh.
Introducing DSpace 7
February 28, 2017 presented by: Claire Knowles - The University of Edinburgh, Art Lowel - Atmire, Andrea Bollini - 4Science, Tim Donohue – DuraSpace
The document provides an overview of Entity Framework and Code First approach. It discusses the key features of Entity Framework including object-relational mapping, support for various databases, and using LINQ queries. It also covers the advantages of Entity Framework like productivity, maintainability and performance. Furthermore, it explains the different domain modeling approaches in Entity Framework - Model First, Database First and Code First along with code examples.
TestNG is a testing framework inspired from JUnit and NUnit, which can be used as a core unit test framework for Java project.
Demo: https://github.com/bethmi/testng-demo
This document provides an overview of ASP.NET Web API, a framework for building HTTP-based services. It discusses key Web API concepts like REST, routing, actions, validation, OData, content negotiation, and the HttpClient. Web API allows building rich HTTP-based apps that can reach more clients by embracing HTTP standards and using HTTP as an application protocol. It focuses on HTTP rather than transport flexibility like WCF.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
This document provides an overview of ASP.NET Core 1.0 and discusses its evolution from previous ASP.NET technologies. It covers the ASP.NET architecture, Model-View-Controller pattern, ASP.NET MVC and Web API project templates, tag helpers, consuming Web APIs, and using JavaScript frameworks with ASP.NET Core.
Accessibility Testing is one of the important types of testing that add value to your business and deliver user friendly applications. Axe Core is a very powerful framework that can help the team to build web products that are inclusive. In this article, different ways to test the Accessibility and the automation part have been discussed in full length. You can achieve Accessibility Testing with the help of the following methods/approaches
Elastic search custom chinese analyzerLearningTech
Quick intro on writing custom analyzers using the nGram tokenizer. This presentation tackles issues that arises when dealing with Chinese data and demonstrates some examples on how to resolve those issues.
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
General architectural concepts of Elasticsearch and what's new in version 5? Examples are prepared with our company business therefore these are excluded from presentation.
This document provides an overview of developing applications using Oracle Application Express (APEX). It discusses the APEX architecture and components used for browser-based application development like the Application Builder, SQL Workshop, and Administrator. The benefits of APEX are also summarized like rapid development, mobile support, and use cases. Steps for creating a demo "help desk" application are outlined, including designing the database tables, loading sample data, and basic application navigation.
Elasticsearch is a free and open source distributed search and analytics engine. It allows documents to be indexed and searched quickly and at scale. Elasticsearch is built on Apache Lucene and uses RESTful APIs. Documents are stored in JSON format across distributed shards and replicas for fault tolerance and scalability. Elasticsearch is used by many large companies due to its ability to easily scale with data growth and handle advanced search functions.
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
MongoDB is a cross-platform document-oriented database system that is classified as a NoSQL database. It avoids the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas. MongoDB was first developed in 2007 and is now the most popular NoSQL database system. It uses collections rather than tables and documents rather than rows. Documents can contain nested objects and arrays. MongoDB supports querying, indexing, and more. Queries use JSON-like documents and operators to specify search conditions. Documents can be inserted, updated, and deleted using various update operators.
This is a presentation which describe the big picture of the Rest API. In this presentation I simply describe the theories with practical examples. Hope this presentation will cover the overall Rest API domain.
Learning To Rank has been the first integration of machine learning techniques with Apache Solr allowing you to improve the ranking of your search results using training data.
One limitation is that documents have to contain the keywords that the user typed in the search box in order to be retrieved(and then reranked). For example, the query “jaguar” won’t retrieve documents containing only the terms “panthera onca”. This is called the vocabulary mismatch problem.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s information need without necessarily containing those query terms; it learns the similarity of terms and sentences in your collection through deep neural networks and numerical vector representation(so no manual synonyms are needed!).
This talk explores the first Apache Solr official contribution about this topic, available from Apache Solr 9.0.
We start with an overview of neural search (Don’t worry - we keep it simple!): we describe vector representations for queries and documents, and how Approximate K-Nearest Neighbor (KNN) vector search works. We show how neural search can be used along with deep learning techniques (e.g, BERT) or directly on vector data, and how we implemented this feature in Apache Solr, giving usage examples!
Join us as we explore this new exciting Apache Solr feature and learn how you can leverage it to improve your search experience!
This document outlines the topics covered in an Edureka course on MongoDB. The course contains 8 modules that cover MongoDB fundamentals, CRUD operations, schema design, administration, scaling, indexing and aggregation, application integration, and additional concepts and case studies. Each module contains multiple topics that will be taught through online instructor-led classes, recordings, quizzes, assignments, and support.
Slides are created to demonstrate about ETL Testing, some one who want to start and learn ETL Tesing can make use of this ppt. It includes contents related all ETL Testing schema
This document provides an introduction and overview of REST APIs. It defines REST as an architectural style based on web standards like HTTP that defines resources that are accessed via common operations like GET, PUT, POST, and DELETE. It outlines best practices for REST API design, including using nouns in URIs, plural resource names, GET for retrieval only, HTTP status codes, and versioning. It also covers concepts like filtering, sorting, paging, and common queries.
Hot Topics: The DuraSpace Community Webinar Series,
“Introducing DSpace 7: Next Generation UI”
Curated by Claire Knowles, Library Digital Development Manager, The University of Edinburgh.
Introducing DSpace 7
February 28, 2017 presented by: Claire Knowles - The University of Edinburgh, Art Lowel - Atmire, Andrea Bollini - 4Science, Tim Donohue – DuraSpace
The document provides an overview of Entity Framework and Code First approach. It discusses the key features of Entity Framework including object-relational mapping, support for various databases, and using LINQ queries. It also covers the advantages of Entity Framework like productivity, maintainability and performance. Furthermore, it explains the different domain modeling approaches in Entity Framework - Model First, Database First and Code First along with code examples.
TestNG is a testing framework inspired from JUnit and NUnit, which can be used as a core unit test framework for Java project.
Demo: https://github.com/bethmi/testng-demo
This document provides an overview of ASP.NET Web API, a framework for building HTTP-based services. It discusses key Web API concepts like REST, routing, actions, validation, OData, content negotiation, and the HttpClient. Web API allows building rich HTTP-based apps that can reach more clients by embracing HTTP standards and using HTTP as an application protocol. It focuses on HTTP rather than transport flexibility like WCF.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
This document provides an overview of ASP.NET Core 1.0 and discusses its evolution from previous ASP.NET technologies. It covers the ASP.NET architecture, Model-View-Controller pattern, ASP.NET MVC and Web API project templates, tag helpers, consuming Web APIs, and using JavaScript frameworks with ASP.NET Core.
Accessibility Testing is one of the important types of testing that add value to your business and deliver user friendly applications. Axe Core is a very powerful framework that can help the team to build web products that are inclusive. In this article, different ways to test the Accessibility and the automation part have been discussed in full length. You can achieve Accessibility Testing with the help of the following methods/approaches
Elastic search custom chinese analyzerLearningTech
Quick intro on writing custom analyzers using the nGram tokenizer. This presentation tackles issues that arises when dealing with Chinese data and demonstrates some examples on how to resolve those issues.
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
General architectural concepts of Elasticsearch and what's new in version 5? Examples are prepared with our company business therefore these are excluded from presentation.
Использование Elasticsearch для организации поиска по сайтуOlga Lavrentieva
Дмитрий Жлобо, Ruby and Rails Developer в Twinslash
«Использование Elasticsearch для организации поиска по сайту»
Организация качественного поиска на сайте – сложная и нетривиальная задача. В своем докладе Дмитрий расскажет о том, как ее решить с помощью Elasticsearch.
Будет рассмотрено, как Elasticsearch работает с текстом или другими данными: от анализа и индексации документов до поиска и агрегации. По шагам и на примерах будет показано, как настроить поиск, учитывающий, например, морфологию и фонетику русского языка. Также Дмитрий расскажет, как все это использовать в приложениях на Ruby, как организовать добавление документов в индекс и др.
I just hacked your app! - Marcos Placona - Codemotion Rome 2017Codemotion
Android security is nowhere near where it should be. I have been able to hack and get sensitive information from a few different apps and I’m just an amateur hacker at best. It’s easy to forget mobile devices aren’t as safe as we think they are. In this session we will explore a number of ways an Android app can be exploited and methods we can use to avoid these attacks. We will finish by looking at common techniques that will help you protect sensitive information within your application by adding tampering detection and making sure every external communication request is made securely.
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017Codemotion
Today’s applications are expected to provide powerful full-text search. But how does that work in general and how do I implement it on my site or in my application? Actually, this is not as hard as it sounds at first. This talk covers: * How full-text search works in general and what the differences to databases are. * How the score or quality of a search result is calculated. * How to implement this with Elasticsearch. Attendees will learn how to add common search patterns to their applications without breaking a sweat.
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
In the age of information and big data, ability to quickly and easily find a needle in a haystack is extremely important. Elasticsearch is a distributed and scalable search engine which provides rich and flexible search capabilities. Social networks (Facebook, LinkedIn), media services (Netflix, SoundCloud), Q&A sites (StackOverflow, Quora, StackExchange) and even GitHub - they all find data for you using Elasticsearch. In conjunction with Logstash and Kibana, Elasticsearch becomes a powerful log engine which allows to process, store, analyze, search through and visualize your logs.
Video: https://www.youtube.com/watch?v=GL7xC5kpb-c
Scripts for the Demo: https://github.com/opanchenko/morning-at-lohika-ELK
2. Ajanda
2
Udemy
Elasticsearch ve Temel Arama Kavramları
Udemy’de Elasticsearch kullanımı:
İndeksleme
Temel arama işlevleri
Metin Analizi
Çoklu dil desteği
TF ve IDF değerlerinin çıkarımı
3. Udemy
3
Video tabanlı Online Eğitim Platformu ve Pazaryeri
+12 Milyon Aktif Kullanıcı
50 Bin Ders
Web, Mobil ve AppleTV uygulamaları
San Francisco, Ankara ve Dublin ofisleri
Toplam 254 çalışan, 85+ Mühendis
Udemy Türkiye Ofisi:
4. Udemy (Discovery)
4
Amaç: Doğru kullanıcıyı doğru dersle buluşturmak
Alt Takımlar
Arama (Search)
Öneri (Recommendation)
Kanallar(Channel)
Veri Bilimi (Data Science) destekli çalışmalar
Kullanılan araçlardan bazıları:
5. Elasticsearch
5
Arama Motoru ve İlişkisel Olmayan Veritabanı
Java ile geliştirilmiş
Açık Kaynaklı
Apache Lucene tabanlı
Dağıtık
Hata dayanıklı(Fault tolerant)
Ölçeklendirilebilir
6. Apache Solr ile karşılaştırma
6
Ölçeklendirebilme avantajı
İdame Kolaylığı
Daha kullanışlı API
7. Temel Elasticsearch (ES) Kavramları
7
Node: Her bir çalıştırılabilir ES sunucusu
Cluster(Küme): Verinin dağıtık ve tutarlı şekilde tutulduğu Node grubu
Index: Benzer karakterdeki verilerin(doküman) saklandığı yapı
Document(Doküman): İndekslenebilen temel veri yapısı
JSON
Field(Alan): Dokümanın üstündeki alanlar
Shard: Her bir indeksin içinde dokümanların bir kısmını tutan bölüm
9. Shard ve Replicalar
9
Shard:
Her shard bir Lucene indeksidir.
Sorgular her shard üzerinde çalıştıktan sonra, koordinatör
node üzerinde birleştirilir (scatter/gather).
Shard sayısı performans/ölçeklendirme üzerinde etkilidir.
Replica:
Hata dayanıklılığı
Sorgu performansında iyileşme
10. Temel Arama Kavramları
10
Relevance(İlgi): Bir aramanın, dokümanla örtüşmesi/benzerliği.
Terim Frekansı(Term Frequency, TF):
Bir terimin metinde geçme sıklığı
Ters Doküman Frekansı(Inverse Document Frequency, IDF)
Bir terimin ne kadar az dokümanda geçtiğinin ölçüsü
Ayırdedicilik
Score(Skor): İlginin numerik ifadesi
11. Terim Frekansı(Örnek)
11
“Java Programlama ve Temel Programlama Mantığı”
“Java Programlama ve Veri Madenciliği”
“Programlama” aramasında, ilk doküman daha yüksek skora sahip olacaktır.
Terim Java Programlama ve Temel Mantığı
TF 1 2 1 1 1
Terim Java Programlama ve Veri Madenciliği
TF 1 1 1 1 1
12. Ters Doküman Frekansı (Örnek)
12
Senaryo:
Veri setimiz 100 dokümandan oluşsun.
Bunların 10’unda “Java”, 3’ünde “Elasticsearch” terimleri geçiyor
olsun.
“Elasticsearch” daha yüksek IDF değerine sahip(daha ayırt edici).
Sonuç:
“Java Elasticsearch” aramasında, “Elasticsearch” içeren
dokümanlar daha yüksek skora sahip olur(arama sonuçlarında
14. Udemy’de Elasticsearch
14
Versiyon 1.5.2
Canlı, Test ve Geliştirme Kümeleri
7 Node
Çoklu shard ve replica’lar
REST API ile erişim
İstemci Kütüphaneleri:
elasticdsl (Python)
Spring Data (Java)
15. Kullanılan Araçlar
15
HQ Plugin’i
Cluster’ın izlenmesi ve basit yönetimi
Datadog/New Relic
Performansın ve kaynak kullanımının görüntülenmesi
Sense
Google Chrome Plugin’i
Cluster’a REST istekleri gönderimi
Elasticdump
17. İndeksleme
17
Dokümanlar veritabanından alınarak Elasticsearch’e indekslenir
Gerçek Zamanlı İndeksleme:
Verilerdeki değişiklikler uygulama üzerinden fark edilerek değişen dokümanlar gerçek
zamanlı olarak indekslenir
Periyodik İndeksleme:
Bütün dokümanlar periyodik olarak Elasticsearch üzerinde güncellenirler
Toplu(Bulk) olarak yapıldığı için gerçek zamanlı indekslemeye göre daha performanslıdır
18. İndekslenen alanlar (Ders)
18
Ders sahibi(instructor) kaynaklı alanlar:
Başlık, Altbaşlık, Ders Tanımı, Ders Veren(instructor) adı….
Dersin kalitesi/performansı ile alakalı alanlar:
Öğrenci sayısı, Puan, Yorumlar, Kazanç, İzlenme miktarı….
Dersin ait olduğu kategori/alt başlık/koleksiyon bilgileri
Dersin ücretine ilişkin bilgiler
Ders ile alakalı etiketler(Tag)
Manuel ve Algoritmik
19. Temel Arama İşlevleri
19
Amaç:
Aramaya uygun dokümanları(dersleri) bulmak
Aramaya daha çok uyan dokümanların arama sonuçlarında yukarıda
görünmesi
Udemy Arama Sonuçları
Uyarlanmış (Custom) bir fonksiyonla sıralanır
Bileşenler:
Kullanıcı kaynaklı alanlar
20. Kullanıcı kaynaklı alanlar
20
Başlık, Altbaşlık, Ders verenin adı vs.
Genel olarak TF-IDF hesaplamasına dayalı, uyarlanmış(custom) bir skor
fonksiyonu kullanılır.
Her alan farklı şekilde ağırlıklandırılır(boosting).
Arama yapan kullanıcının dilindeki dersler daha yüksek skorlandırılır.
21. Kalite/Performans kaynaklı alanlar
21
Dersin aldığı kullanıcı puanları,yorumlar, kazanç, izlenme vs.
Her alan, uyarlanmış birer fonksiyon üzerinden, skora etki eder.
Alanların ağırlıkları ve fonksiyonların parametreleri, geçmiş veriye bağlı
olarak Makine Öğrenmesi yöntemleriyle belirlenir
22. Filtreleme ve Gruplama
22
Kullanıcının arama sonuçlarını daraltmasını ve aradığı dersi daha kolay
bulmasını sağlar
Elasticsearch’ün filtre ve gruplama(aggregation) özellikleri kullanılır
Filtre Alanları:
Dil
Seviye
Ücret(Ücretli/Ücretsiz)
Özellikler(Altyazı, Quiz, Kodlama Egzersizi)
24. Sıralama
24
Kullanıcı arama sonuçlarını belli kriterlere göre sıralayabilir
Kriterler:
İlgi(varsayılan)
Fiyat
Puan
Eklenme Tarihi
Elasticsearch’ün sıralama(sort) işlevi ile yapılır
25. Otomatik Tamamlama
25
Arama kutusunda, kullanıcının aramasını otomatik olarak tamamlamaya
yardım eder.
Kullanılan bilgiler:
Ders başlığı
Ders veren kullanıcı isimleri
Önceki popüler arama kayıtları
Burada, Elasticsearch’ün metin eşleştirme (text match) işlevleri kullanılır
Prefix eşleşmesi
26. Öneri (Bunu mu demek istediniz?)
26
Arama sonucuna uygun bir sonuç bulunamadığında alternatif öneriler için
kullanılır.
Elasticsearch’ün öneri(‘suggest’) işlevinden faydalanılır.
Dokümanlarda yer alan daha sık görülen benzer sözcükler öneri olarak
sunulur.
27. Metin Analizi ve Temizlenmesi
27
Metinler çoğunlukla analiz ve temizleme işlemine ihtiyaç duyarlar:
Dokümanlardaki alanlar
Arama terimleri
Bu amaçla Elasticsearch Analyzer’ları kullanılır.
Her alan için ayrı bir Analyzer tanımlanabilir
Analyzer işlevleri:
Ayrıştırma(Tokenization): Metni parçalara(token) ayırma
28. Udemy’de Metin Analizi
28
Desteklenen diller:
İngilizce
Japonca
Kullanılan Analyzer’lar
Elasticsearch üzerinde gelen standard Analyzer’lar
Udemy tarafından geliştirilen ‘Udemy Analyzer’
Plugin
Japonca ayrıştırmasını yapan ‘RBL Analyzer’
29. Metin Analizi İşlemleri
29
Etkisiz sözcük ayrıştırılması
‘A Java Course for Beginners’ -> ‘Java’, ‘Course’ ,’Beginners’
Özel karakterler içeren sözcükler
‘C++’, ‘PL/SQL’
ASCII olmayan karakterlerin dönüşümleri
‘ÇAĞDAŞ’->’CAGDAS’,’ÇAĞDAŞ’
Eşanlamlı sözcüklerin işlenmesi
‘js’,’java script’, ‘javascript’-> ‘javascript’
30. Çoklu dil desteği (Japonca)
30
Japonca olan doküman metinler, Japonca Analyzer’ı (RBL) kullanılarak ayrı
alanlarda indekslenir.
Japonca arama terimleri, bu alanlar üzerinden sorgulanır.
Arama teriminin dilini saptamak için, Rosette’nın ilgili API’ı kullanılır.
31. TF-IDF Değerlerinin Çıkarımı
31
Gereksinim:
Arama algoritmasının kişiselleştirilebilmesi amacıyla, arama sonuçlarına ait TF ve IDF
değerlerinin alınması
Problem:
Elasticsearch bu değerleri sadece açıklama (explain) modu açıkken veriyor.
Performanslı değil
Çözüm:
Elasticsearch Agent
Bytecode instrumentation ile Elasticsearch davranışı çalışma anında değiştirilerek
32. Gelecek Planları
32
Elasticsearch versiyonunu 1.5.2’den 2.x.y’ye yükseltme
Filtre yapısının değiştirilmesi
Elasticsearch Agent’ın 2.x.y’e uygun olarak, ya da Plugin olarak yeniden yazılması
Cluster optimizasyonları
Metin analizinde iyileştirmeler
Skor fonksiyonunda iyileştirmeler