The document discusses graph databases and their properties. Graph databases are structured to store graph-based data by using nodes and edges to represent entities and their relationships. They are well-suited for applications with complex relationships between entities that can be modeled as graphs, such as social networks. Key graph database technologies mentioned include Neo4j, OrientDB, and TinkerPop which provides graph traversal capabilities.
This document discusses messaging queues and platforms. It begins with an introduction to messaging queues and their core components. It then provides a table comparing 8 popular open source messaging platforms: Apache Kafka, ActiveMQ, RabbitMQ, NATS, NSQ, Redis, ZeroMQ, and Nanomsg. The document discusses using Apache Kafka for streaming and integration with Google Pub/Sub, Dataflow, and BigQuery. It also covers benchmark testing of these platforms, comparing throughput and latency. Finally, it emphasizes that messaging queues can help applications by allowing producers and consumers to communicate asynchronously.
活用段階に入ったNoSQLですがまだまだ実際どう使えるのかご存じ無い方も多いのでは無いでしょうか。当セッションでは、MapR-DB(Hbase互換のNoSQL)が企業でどう活用されているのか、インドのマイナンバー事例や国内事例を元に実際の使い方のイメージと技術的な裏付けをご説明します。2015年6月10〜12日に開催されたdb tech showcase Tokyo 2015での講演資料です。
This document discusses messaging queues and platforms. It begins with an introduction to messaging queues and their core components. It then provides a table comparing 8 popular open source messaging platforms: Apache Kafka, ActiveMQ, RabbitMQ, NATS, NSQ, Redis, ZeroMQ, and Nanomsg. The document discusses using Apache Kafka for streaming and integration with Google Pub/Sub, Dataflow, and BigQuery. It also covers benchmark testing of these platforms, comparing throughput and latency. Finally, it emphasizes that messaging queues can help applications by allowing producers and consumers to communicate asynchronously.
活用段階に入ったNoSQLですがまだまだ実際どう使えるのかご存じ無い方も多いのでは無いでしょうか。当セッションでは、MapR-DB(Hbase互換のNoSQL)が企業でどう活用されているのか、インドのマイナンバー事例や国内事例を元に実際の使い方のイメージと技術的な裏付けをご説明します。2015年6月10〜12日に開催されたdb tech showcase Tokyo 2015での講演資料です。
This document discusses strategies for optimizing access to large "master data" files in PHP applications. It describes converting master data files from PHP arrays to tab-separated value (TSV) files to reduce loading time. Benchmark tests show the TSV format reduces file size by over 50% and loading time from 70 milliseconds to 7 milliseconds without OPcache. Accessing rows as arrays by splitting on tabs is 3 times slower but still very fast at over 350,000 gets per second. The TSV optimization has been used successfully in production applications.
The document outlines the divisions and focus areas of a large company, with most resources allocated to technology R&D, including deep learning and CNN. Other divisions include infrastructure, promotions, UI design, SEO, big data, IT, recruiting, staffing services, and administration.
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Recruit Technologies
This document describes a method for recommending suitable companies to new graduates based on their browsing history and other data. It proposes using implicit matrix factorization of browsing data along with Bayesian optimization of hyperparameters to focus recommendations on less popular, low-browsed companies. An evaluation on Japanese student and company data showed this approach achieved higher recall of suitable matches for low-browsed companies compared to other methods, especially when incorporating additional student and company profile information.
The document discusses the BIG DATA department of Recruit Technologies. It describes how the department has used Amazon's Elastic MapReduce (EMR) and Elastic Compute Cloud (EC2) services since 2010 to perform big data analytics on datasets using Hadoop. It provides details on how the department configures and manages EMR clusters on EC2 spot instances to perform tasks like log analysis and recommendation algorithms in a cost-effective manner. Various strategies are discussed around optimizing the use of spot instances and EMR to reduce costs while managing reliability.