I'm learning elasticsearch now.
This slide is old, new version is here. -> http://blog.johtani.info/blog/2013/08/30/hold-first-elasticsearch-meetup-in-japan/
Branding Engineer様主催のAWSイベントの登壇資料です。"Undifferentiated Heavy Lifting"の排除、Amazon DSSTNEのご紹介、AWS Solutions Architect の活用方法、AdTech on AWS、Startup on AWSなどについてお話させていただきました。
JOSUG2014 OpenStack 4th birthday party in Japan; the way of OpenStack API DragonNaoto Gohko
JOSUG2014 OpenStack 4th birthday party in Japan
the way of OpenStack API Dragon.
we provide OpenStack API on "GMO Apps Cloud" known to be capable of providing efficient social Games.
Eiji Shinohara introduces himself as an AWS Solutions Architect who focuses on the AdTech and startup markets. He provides details about the AWS Japan AdTech team members who support the advertising industry. Shinohara highlights several Japanese AdTech companies that are using AWS, including Opt, Dynalyst, fluct, Intimate Merger, and UNICORN. He encourages attendees to use the #AWSAdTechJP hashtag when posting about Japanese AdTech on AWS and invites people to the upcoming re:Invent conference.
This document summarizes four Japanese AdTech companies that use AWS: Dynalyst, fluct, IM-DMP, and UNICORN. Dynalyst uses AWS for real-time bidding and cross-region data processing. fluct is an SSP that processes 30 billion impressions per month on a serverless architecture. IM-DMP utilizes Amazon ECS and Spot Fleet to power its public DMP. UNICORN is a full automated marketing platform that uses AWS for real-time bidding, data analysis, and machine learning.
This document discusses various CTO networking events in Japan called "#CTONight" that are supported by Amazon Web Services (AWS). It describes #CTONight events with the Amazon CTO, TechCrunch #CTONight events that award the Japanese Startup CTO of the Year, and IVS #CTONight and Day multi-day conferences organized together with Infinity Ventures Summit for startup CTOs to share knowledge, experiences, best practices and network with each other.
[要約] Building a Real-Time Bidding Platform on AWS #AWSAdTechJPEiji Shinohara
2016年3月に公開されたWhitepaper『Building a Real-Time Bidding Platform on AWS』の要約です。
https://d0.awsstatic.com/whitepapers/Building_a_Real_Time_Bidding_Platform_on_AWS_v1_Final.pdf
2. Name:
Eiji Shinohara / 篠原 英治 / @shinodogg
Role:
AWS Solutions Architect
Subject Matter Expert
・Amazon CloudSearch
・Amazon Elasticsearch Service
Who am I?
3. Which Search Engine/Service do you use?
• Apache Solr
• Elasticsearch
• Amazon CloudSearch
• Amazon Elasticsearch Service
4. On top of Apache Lucene
• Apache Solr
• Elasticsearch
• Amazon CloudSearch
• Amazon Elasticsearch Service
5. Have you used Apache Lucene?
•Apache Lucene is a free and open-
source information retrieval software library,
originally written in Java by Doug Cutting.
•It is supported by theApache Software
Foundation and is released under the Apache
Software License.
https://en.wikipedia.org/wiki/Lucene
6. Doug Cutting – Hadoop/Nutch/Lucene
•Hadoop: MapReduce
•The name my kid gave a stuffed yellow elephant.
•Nutch: Crawler
•Nutch was the way my oldest son when he was two, I
think it came from lunch.
•Lucene: Search
•Lucene is Doug Cutting's wife's middle name, and her
maternal grandmother's first name.
http://www.mwsoft.jp/programming/hadoop/where_come_from.html
7. Doug Cutting – Hadoop/Nutch/Lucene
•Hadoop: MapReduce
•The name my kid gave a stuffed yellow elephant.
•Nutch: Crawler
•Nutch was the way my oldest son when he was two, I
think it came from lunch
•Lucene: Search
•Lucene is Doug Cutting's wife's middle name, and her
maternal grandmother's first name.
http://www.mwsoft.jp/programming/hadoop/where_come_from.html
Maybe most proper naming J
9. Apache Lucene
•Full-Text search
• Easy to use
1. Index
• new Document → addDocument → commit
2. Query
• Generate Query String
3. Search
• Search and Fetch hitted documents
4. Display
• Get contents from fetched documents to show
http://www.lucenetutorial.com/lucene-in-5-minutes.html
10. Evernote and LinkedIn are using Lucene
• w/ thin their own HTTP wrapper
• Presentation at Lucene Solr Revolution 2014
https://www.youtube.com/watch?v=drOmahIie6c https://www.youtube.com/watch?v=8O7cF75intk
11. Build your own Search engine?
• Some companies are doing that
http://www.slideshare.net/lucidworks/galene-linkedins-search-architecture-
presented-by-diego-buthay-sriram-sankar-linkedin/8
13. Apache Lucene⼊⾨ in Japanese
http://rondhuit.com/lucene-for-bea-060710.pdfhttp://www.amazon.co.jp/dp/4774127809
17. Lucene in Action chap5: Term Vector (2)
Calcurate Document Similarity
http://mocobeta-backup.tumblr.com/post/49779999073/
18. Lucene in Action chap5: Term Vector (2)
Calcurate Document Similarity
• Just tried to run on local Macbook Air J
• Created 2 classes
• Indexer
• Indexing some documents
• CalculationSimilarityTester
• Comparing 2 documents
• Calculate cosine similarity
• Using Luke for browsing index
• https://github.com/DmitryKey/luke
• Uchida-san is also Luke comitter
•
19. Lucene 6.0
• I had Lucene 5.5 environment but,,,
• Invalid directory at the location, check console for more
information. Last exception:
• java.lang.IllegalArgumentException: Could not load codec
'Lucene60'. Did you forget to add lucene-backward-codecs.jar?
21. Indexer
public class Indexer {
public static void main(String args[]) throws IOException {
Analyzer analyzer = new JapaneseAnalyzer();
〜略〜
File[] files = new File("/Users/xxx/lucene_test/docs/").listFiles();
for (File file : files) {
Document doc = new Document();
〜略〜
FieldType contentsType = new FieldType();
contentsType.setStored(true);
contentsType.setTokenized(true);
contentsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
contentsType.setStoreTermVectors(true);
〜略〜
doc.add(new Field("contents", sb.toString(), contentsType));
writer.addDocument(doc);
}
writer.commit();
writer.close();
}
}
• Read file -> add Document -> Commit
22. Indexer
• Files
• Found examples on the internet :)
• http://www.pahoo.org/e-soul/webtech/php06/php06-21-01.shtm
PHP: Hypertext Preprocessor(ピー・エイチ・ピー ハイパーテキスト プリプロ
セッサー)とは、動的に HTML データを⽣成することによって、動的なウェブペー
ジを実現することを主な⽬的としたプログラミング⾔語、およびその⾔語処理系で
ある。PHP は、HTML 埋め込み型のサーバサイド・スクリプト⾔語として分類され
る。この⾔語処理系⾃体は、C⾔語で記述されている。
PHP(Hypertext Preprocessor;ピー・エイチ・ピー)とは、動的に HTML データ
を⽣成することによって、動的なウェブページを実現すること⽬的としたプログラ
ミング⾔語である。PHP は、HTML 埋め込み型のサーバサイド・スクリプト⾔語の
⼀種で、処理系⾃体は C⾔語で記述されている。
23. Indexer
• Files
• Found examples on the internet :)
• http://www.fisproject.jp/2015/01/cosine_similarity/
• Exactly same
A Cat sat on the mat.
Cats are sitting on the mat.
⼈⼝から無作為に選択されて、⼈⼝に関する仮説を試験するために使⽤される項⽬
となっております。
⼈⼝から無作為に選択されて、⼈⼝に関する仮説を試験するために使⽤される項⽬
となっております。
30. Calcurate Document Similarity
• mocobeta/CalcCosineSimilarityTest.java
• https://gist.github.com/mocobeta/5525864
• Search document from index
• TF-IDF from Term Vector
• TF-IDF
• how important a word is to a document in a collection or corpus
• TF: how frequently a term occurs in a document
• IDF: it's a measure of the rareness of a term
• Get Cosine-Similarity
• Lower is similar
39. N-best
• Contribute from Yahoo! Japan
http://www.slideshare.net/techblogyahoo/17lucenesolr-solrjp-apache-lucene-solrnbest
40. N-best
• Contribute from Yahoo! Japan
http://www.slideshare.net/techblogyahoo/17lucenesolr-solrjp-apache-lucene-solrnbest
42. Nihongo Muzukashii-ne…
• Need to analyze more or maintain dictionaries??
http://www.slideshare.net/techblogyahoo/17lucenesolr-solrjp-apache-lucene-solrnbest
43. Nihongo Muzukashii-ne…
• Doesnʼt hit with “⼀眼レフ”(Single-lens reflex)?
http://blog.yoslab.com/entry/2014/09/12/005207