An overview of all the different content related technologies at the Apache Software Foundation
Talk from ApacheCon NA 2010 in Atlanta in November 2010
DrupalCampLA 2014 - Drupal backend performance and scalabilitycherryhillco
This document discusses various techniques for optimizing Drupal backend performance and scalability. It covers diagnosing issues through tools like Apache Benchmark and Munin, optimizing hardware, web and database servers like using Nginx, Varnish, MySQL tuning, and alternative databases like MongoDB. It also discusses PHP optimizations like opcode caching and HHVM. The goal is to provide strategies to handle more traffic, improve page response times, and minimize downtime through infrastructure improvements and code optimizations.
This document provides an overview of a workshop on using PHP and MySQL for library applications. It begins with introductions and outlines assumptions, goals and an overview of web applications. It then defines what PHP is and its benefits for database management. Key points covered include how to set up a MySQL database, how to input and maintain data, and how to connect PHP to the database. The document demonstrates basic PHP syntax and functions like variables, operators, forms and comments. Exercises are provided to help participants practice what they learned.
Type safety is extremely important in any application built around a stream / queue. Type definition and evolution can either be built in the application or relied upon the data layer to support it out of the box allowing the application to only concentrate on business logic, not how of data store and evolution. It is this property of the good old relational databases (among others) that make them a favourite among all the modern NoSQL databases. Modern software architectures requires asynchronous communication (via stream / queue). While the data store and query design changes with asynchronous communication, type safety is still equally important.
In this slide deck, used for Apache Con 2021 talk, we will go over ways in which one can force structure (schema) over the streaming data. As an example, we will talk about Apache Pulsar. Apache pulsar offers server as well as client side support for the structured streaming. We have been using pulsar for asynchronous communication among microservices in our nutanix beam and flow security central apps for over 1.5 years in production. This deck presents the technical details on what is schema, how to represent schema, what is available in the apache pulsar server and client side, how we have used pulsar’s schema support to build our use cases and our learnings from them.
Large Scale ETL for Hadoop and Cloudera Search using Morphlineswhoschek
Cloudera Morphlines is a new, embeddable, open source Java framework that reduces the time and skills necessary to integrate and build Hadoop applications that extract, transform, and load data into Apache Solr, Apache HBase, HDFS, enterprise data warehouses, analytic online dashboards, or other consumers. If you want to integrate, build, or facilitate streaming or batch transformation pipelines without programming and without MapReduce skills, and get the job done with a minimum amount of fuss and support costs, Morphlines is for you.
In this talk, you'll get an overview of Morphlines internals and explore sample use cases that can be widely applied.
This document discusses using big data tools to build a fraud detection system. It outlines using Azure infrastructure to set up a Hadoop cluster with HDFS, HBase, Kafka and Spark. Mock transaction data will be generated and sent to Kafka. Spark jobs will process the data in batches, identifying potentially fraudulent transactions and writing them to an HBase table. The data will be visualized using Zeppelin notebooks querying Phoenix SQL on HBase. This will allow analysts to further investigate potential fraud patterns in near real-time.
Lessons from {distributed,remote,virtual} communities and companiesColin Charles
A last minute talk for the people at DevOps Amsterdam, happening around the same time as O'Reilly Velocity Amsterdam 2016. Here are lessons one can learn from distributed/remote/virtual communities and companies from someone that has spent a long time being remote and distributed.
Databases require capacity planning (and to those coming from traditional RDBMS solutions, this can be thought of as a sizing guide). Capacity planning prevents resource exhaustion. Capacity planning can be hard. This talk has a heavier leaning on MySQL, but the concepts and addendum will help with any other data store.
This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.
This is a recording of a guest Lecture I gave at the University of Texas school of Information.
In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.
Find more on my blog:
http://schneems.com
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
This document provides a summary of new features and improvements in recent versions of Apache HBase, a distributed, scalable, big data store. It discusses major changes and enhancements in HBase 0.92+, 0.94+, and 0.96+, including new HFile formats, coprocessors, caching improvements, performance tuning, and more. The document is intended to bring readers up to date on the current state and capabilities of HBase.
My talk at ScaleConf 2017 in Cape Town on some tips and tactics for scaling WordPress, with reference to WordPress.com and the container-based VIP Go platform.
Video of my talk is here: https://www.youtube.com/watch?v=cs0DcY80spw
Redis is an in-memory key-value data store that can be used for caching, sessions, queues, leaderboards, and more. It provides fast performance due to being memory-resident and supporting different data structures like strings, hashes, lists, sets, and sorted sets. Redis is useful for read-heavy and real-time applications but may not be suitable if data does not fit in memory or for relational data needs. The presentation discusses using Redis with PHP and Symfony, data sharding strategies, and war stories from a social game with 7.5M daily users.
SharePoint Saturday The Conference 2011 - SP2010 PerformanceBrian Culver
Is your farm struggling to server your organization? How long is it taking between page requests? Where is your bottleneck in your farm? Is your SQL Server tuned properly? Worried about upgrading due to poor performance? We will look at various tools for analyzing and measuring performance of your farm. We will look at simple SharePoint and IIS configuration options to instantly improve performance. I will discuss advanced approaches for analyzing, measuring and implementing optimizations in your farm.
This document summarizes AdGooroo's experience deploying Hadoop in a Windows environment. Key points include:
- Hadoop and Windows can integrate but require workarounds like NFS for data transfer between Linux and Windows.
- Tools like Hive and Sqoop worked as expected while others like Flume were overkill.
- Unexpected issues arose with data serialization formats like AVRO not being fully compatible between .NET and Java.
- The learning curve is steep but can be flattened by taking things one component at a time.
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Chris Fregly
This document summarizes a presentation about generating real-time streaming recommendations using NiFi, Kafka, and Spark ML. The presentation demonstrates using NiFi to ingest data from HTTP requests, enrich it with geo data, and write it to a Kafka topic. It then shows how to create a Spark Streaming application that reads from Kafka to perform incremental matrix factorization recommendations in real-time and handles failures using circuit breakers. The presentation also provides an overview of Netflix's large-scale real-time recommendation pipeline.
Sizing an alfresco infrastructure has always been an interesting topic with lots of unrevealed questions. There is no perfect formula that can accurately define what is the perfect sizing for your architecture considering your use case. However, we can provide you with valuable guidance on how to size your Alfresco solution, by asking the right questions, collecting the right numbers, and taking the right assumptions on a very interesting sizing exercise.
How many alfresco servers will you need on your alfresco cluster? How many CPUs/cores do you need on those servers to handle your estimated user concurrency? How do you estimate the sizing and growth of your storage? How much memory do you need on your Solr servers? How many Solr servers do you need to get the response times you require? What are the golden rules that can drive and maintain the success of an Alfresco project?
This document provides an introduction and overview of Apache Spark, a lightning-fast cluster computing framework. It discusses Spark's ecosystem, how it differs from Hadoop MapReduce, where it shines well, how easy it is to install and start learning, includes some small code demos, and provides additional resources for information. The presentation introduces Spark and its core concepts, compares it to Hadoop MapReduce in areas like speed, usability, tools, and deployment, demonstrates how to use Spark SQL with an example, and shows a visualization demo. It aims to provide attendees with a high-level understanding of Spark without being a training class or workshop.
Spotify: Horizontal Scalability for Great SuccessNick Barkas
The document discusses Spotify's use of horizontal scalability to handle its large user and music catalog sizes. It describes how Spotify scales out by distributing work across separate services and handling shared data through techniques like sharding and eventual consistency. Key approaches Spotify uses include running multiple instances of each service, using load balancers to distribute requests, storing only necessary data in globally consistent databases, and implementing distributed hash tables for service discovery.
Apachecon Europe 2012: Operating HBase - Things you need to knowChristian Gügi
This document provides an overview of important concepts for operating HBase, including:
- HBase stores data in columns families stored as files on disk and writes to memory before flushing to disk.
- Manual and automatic splitting of regions is covered, as well as challenges of improper splitting.
- Tools for monitoring, debugging, and visualizing HBase operations are discussed.
- Key lessons focus on proper data modeling, extensive monitoring, and understanding the whole Hadoop ecosystem.
Quick Git overview presented at a Rackspace Tech Night - it is not a tutorial, no previous knowledge is required, we are simply looking to introduce the concepts of version control and cover some good practice steps when dealing with repositories.
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLAndrew Morgan
Theres a lot of excitement around NoSQL Data Stores with the promise of simple access patterns, flexible schemas, scalability and High Availability. The downside comes in the form of losing ACID transactions, consistency, flexible queries and data integrity checks. What if you could have the best of both worlds? This session shows how MySQL Cluster provides simultaneous SQL and native NoSQL access to your data whether a simple key-value API (Memcached), REST, JavaScript, Java or C++. You will hear how the MySQL Cluster architecture delivers in-memory real-time performance, 99.999% availability, on-line maintenance and linear, horizontal scalability through transparent auto-sharding.
The environment in which your EECMS lives is as important as what can be seen by your clients in their browser. A solid foundation is key to the overall performance, scalability and security of your site. Building on over a decade of server optimization experience, extensive benchmarking and some custom ExpressionEngine extensions this session will show you how to make sure your ExpressionEngine install is ready for prime time.
Data is the fuel for the idea economy, and being data-driven is essential for businesses to be competitive. HPE works with all the Hadoop partners to deliver packaged solutions to become data driven. Join us in this session and you’ll hear about HPE’s Enterprise-grade Hadoop solution which encompasses the following
-Infrastructure – Two industrialized solutions optimized for Hadoop; a standard solution with co-located storage and compute and an elastic solution which lets you scale storage and compute independently to enable data sharing and prevent Hadoop cluster sprawl.
-Software – A choice of all popular Hadoop distributions, and Hadoop ecosystem components like Spark and more. And a comprehensive utility to manage your Hadoop cluster infrastructure.
-Services – HPE’s data center experts have designed some of the largest Hadoop clusters in the world and can help you design the right Hadoop infrastructure to avoid performance issues and future proof you against Hadoop cluster sprawl.
-Add-on solutions – Hadoop needs more to fill in the gaps. HPE partners with the right ecosystem partners to bring you solutions such an industrial grade SQL on Hadoop with Vertica, data encryption with SecureData, SAP ecosystem with SAP HANA VORA, Multitenancy with Blue Data, Object storage with Scality and more.
Redis is a fast, in-memory key-value store that can be used as a cache, message broker, and centralized locking system. It stores data in memory for high performance, but can also asynchronously write data to disk for persistence. Redis supports data types like strings, lists, sets and sorted sets. It enables high availability through master-slave replication and horizontal scaling through sharding of data across multiple nodes.
The rise of NoSQL is characterized with confusion and ambiguity; very much like any fast-emerging organic movement in the absence of well-defined standards and adequate software solutions. Whether you are a developer or an architect, many questions come to mind when faced with the decision of where your data should be stored and how it should be managed. The following are some of these questions: What does the rise of all these NoSQL technologies mean to my enterprise? What is NoSQL to begin with? Does it mean "No SQL"? Could this be just another fad? Is it a good idea to bet the future of my enterprise on these new exotic technologies and simply abandon proven mature Relational DataBase Management Systems (RDBMS)? How scalable is scalable? Assuming that I am sold, how do I choose the one that fit my needs best? Is there a middle ground somewhere? What is this Polyglot Persistence I hear about? The answers to these questions and many more is the subject of this talk along with a survey of the most popular of NoSQL technologies. Be there or be square.
This document provides an overview of a Drupal training covering various topics from September 12-20, 2014. The training will introduce participants to core Drupal concepts and components including nodes, content types, taxonomies, views, panels, modules, themes, and the database layer. It will cover setting up a development environment, installing Drupal, configuring the system, and extending Drupal through custom modules and themes. Participants will learn how Drupal handles user requests and its event-driven hook system. The document also provides contact information for the trainer.
This document summarizes a presentation on using SQL Server Integration Services (SSIS) with HDInsight. It introduces Tillmann Eitelberg and Oliver Engels, who are experts on SSIS and HDInsight. The agenda covers traditional ETL processes, challenges of big data, useful Apache Hadoop components for ETL, clarifying statements about Hadoop and ETL, using Hadoop in the ETL process, how SSIS is more than just an ETL tool, tools for working with HDInsight, getting started with Azure HDInsight, and using SSIS to load and transform data on HDInsight clusters.
If You Have The Content, Then Apache Has The Technology!gagravarr
This document provides a summary of 46 Apache content-related projects and 8 content-related incubating projects. It discusses projects for transforming and reading content like Apache PDFBox, POI, and Tika. Projects for text and language analysis like UIMA, OpenNLP, and Mahout. Projects that work with structured data and linked data like Any23, Stanbol, and Jena. Projects for data management and processing on Hadoop like MRQL, DataFu, and Falcon. Projects for serving content like HTTPD Server, TrafficServer, and Tomcat. Projects that focus on generating content like OpenOffice, Forrest, and Abdera. And projects for working with hosted content like Chemistry and ManifoldCF. The document
This document discusses semantic annotation using custom vocabularies. It introduces Gabriel Dragomir and provides background on semantic web and linked data. It then describes Apache Stanbol, a framework for semantic annotation of documents. Stanbol allows modular processing of documents using configurable workflows and vocabularies. The document outlines Stanbol's architecture and components. It also discusses integrating Stanbol with Drupal for semantic indexing and annotation of content. A demo is proposed to index Drupal data in Stanbol and annotate entities using DBPedia and a custom semantic web vocabulary.
This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.
This is a recording of a guest Lecture I gave at the University of Texas school of Information.
In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.
Find more on my blog:
http://schneems.com
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
This document provides a summary of new features and improvements in recent versions of Apache HBase, a distributed, scalable, big data store. It discusses major changes and enhancements in HBase 0.92+, 0.94+, and 0.96+, including new HFile formats, coprocessors, caching improvements, performance tuning, and more. The document is intended to bring readers up to date on the current state and capabilities of HBase.
My talk at ScaleConf 2017 in Cape Town on some tips and tactics for scaling WordPress, with reference to WordPress.com and the container-based VIP Go platform.
Video of my talk is here: https://www.youtube.com/watch?v=cs0DcY80spw
Redis is an in-memory key-value data store that can be used for caching, sessions, queues, leaderboards, and more. It provides fast performance due to being memory-resident and supporting different data structures like strings, hashes, lists, sets, and sorted sets. Redis is useful for read-heavy and real-time applications but may not be suitable if data does not fit in memory or for relational data needs. The presentation discusses using Redis with PHP and Symfony, data sharding strategies, and war stories from a social game with 7.5M daily users.
SharePoint Saturday The Conference 2011 - SP2010 PerformanceBrian Culver
Is your farm struggling to server your organization? How long is it taking between page requests? Where is your bottleneck in your farm? Is your SQL Server tuned properly? Worried about upgrading due to poor performance? We will look at various tools for analyzing and measuring performance of your farm. We will look at simple SharePoint and IIS configuration options to instantly improve performance. I will discuss advanced approaches for analyzing, measuring and implementing optimizations in your farm.
This document summarizes AdGooroo's experience deploying Hadoop in a Windows environment. Key points include:
- Hadoop and Windows can integrate but require workarounds like NFS for data transfer between Linux and Windows.
- Tools like Hive and Sqoop worked as expected while others like Flume were overkill.
- Unexpected issues arose with data serialization formats like AVRO not being fully compatible between .NET and Java.
- The learning curve is steep but can be flattened by taking things one component at a time.
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Chris Fregly
This document summarizes a presentation about generating real-time streaming recommendations using NiFi, Kafka, and Spark ML. The presentation demonstrates using NiFi to ingest data from HTTP requests, enrich it with geo data, and write it to a Kafka topic. It then shows how to create a Spark Streaming application that reads from Kafka to perform incremental matrix factorization recommendations in real-time and handles failures using circuit breakers. The presentation also provides an overview of Netflix's large-scale real-time recommendation pipeline.
Sizing an alfresco infrastructure has always been an interesting topic with lots of unrevealed questions. There is no perfect formula that can accurately define what is the perfect sizing for your architecture considering your use case. However, we can provide you with valuable guidance on how to size your Alfresco solution, by asking the right questions, collecting the right numbers, and taking the right assumptions on a very interesting sizing exercise.
How many alfresco servers will you need on your alfresco cluster? How many CPUs/cores do you need on those servers to handle your estimated user concurrency? How do you estimate the sizing and growth of your storage? How much memory do you need on your Solr servers? How many Solr servers do you need to get the response times you require? What are the golden rules that can drive and maintain the success of an Alfresco project?
This document provides an introduction and overview of Apache Spark, a lightning-fast cluster computing framework. It discusses Spark's ecosystem, how it differs from Hadoop MapReduce, where it shines well, how easy it is to install and start learning, includes some small code demos, and provides additional resources for information. The presentation introduces Spark and its core concepts, compares it to Hadoop MapReduce in areas like speed, usability, tools, and deployment, demonstrates how to use Spark SQL with an example, and shows a visualization demo. It aims to provide attendees with a high-level understanding of Spark without being a training class or workshop.
Spotify: Horizontal Scalability for Great SuccessNick Barkas
The document discusses Spotify's use of horizontal scalability to handle its large user and music catalog sizes. It describes how Spotify scales out by distributing work across separate services and handling shared data through techniques like sharding and eventual consistency. Key approaches Spotify uses include running multiple instances of each service, using load balancers to distribute requests, storing only necessary data in globally consistent databases, and implementing distributed hash tables for service discovery.
Apachecon Europe 2012: Operating HBase - Things you need to knowChristian Gügi
This document provides an overview of important concepts for operating HBase, including:
- HBase stores data in columns families stored as files on disk and writes to memory before flushing to disk.
- Manual and automatic splitting of regions is covered, as well as challenges of improper splitting.
- Tools for monitoring, debugging, and visualizing HBase operations are discussed.
- Key lessons focus on proper data modeling, extensive monitoring, and understanding the whole Hadoop ecosystem.
Quick Git overview presented at a Rackspace Tech Night - it is not a tutorial, no previous knowledge is required, we are simply looking to introduce the concepts of version control and cover some good practice steps when dealing with repositories.
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLAndrew Morgan
Theres a lot of excitement around NoSQL Data Stores with the promise of simple access patterns, flexible schemas, scalability and High Availability. The downside comes in the form of losing ACID transactions, consistency, flexible queries and data integrity checks. What if you could have the best of both worlds? This session shows how MySQL Cluster provides simultaneous SQL and native NoSQL access to your data whether a simple key-value API (Memcached), REST, JavaScript, Java or C++. You will hear how the MySQL Cluster architecture delivers in-memory real-time performance, 99.999% availability, on-line maintenance and linear, horizontal scalability through transparent auto-sharding.
The environment in which your EECMS lives is as important as what can be seen by your clients in their browser. A solid foundation is key to the overall performance, scalability and security of your site. Building on over a decade of server optimization experience, extensive benchmarking and some custom ExpressionEngine extensions this session will show you how to make sure your ExpressionEngine install is ready for prime time.
Data is the fuel for the idea economy, and being data-driven is essential for businesses to be competitive. HPE works with all the Hadoop partners to deliver packaged solutions to become data driven. Join us in this session and you’ll hear about HPE’s Enterprise-grade Hadoop solution which encompasses the following
-Infrastructure – Two industrialized solutions optimized for Hadoop; a standard solution with co-located storage and compute and an elastic solution which lets you scale storage and compute independently to enable data sharing and prevent Hadoop cluster sprawl.
-Software – A choice of all popular Hadoop distributions, and Hadoop ecosystem components like Spark and more. And a comprehensive utility to manage your Hadoop cluster infrastructure.
-Services – HPE’s data center experts have designed some of the largest Hadoop clusters in the world and can help you design the right Hadoop infrastructure to avoid performance issues and future proof you against Hadoop cluster sprawl.
-Add-on solutions – Hadoop needs more to fill in the gaps. HPE partners with the right ecosystem partners to bring you solutions such an industrial grade SQL on Hadoop with Vertica, data encryption with SecureData, SAP ecosystem with SAP HANA VORA, Multitenancy with Blue Data, Object storage with Scality and more.
Redis is a fast, in-memory key-value store that can be used as a cache, message broker, and centralized locking system. It stores data in memory for high performance, but can also asynchronously write data to disk for persistence. Redis supports data types like strings, lists, sets and sorted sets. It enables high availability through master-slave replication and horizontal scaling through sharding of data across multiple nodes.
The rise of NoSQL is characterized with confusion and ambiguity; very much like any fast-emerging organic movement in the absence of well-defined standards and adequate software solutions. Whether you are a developer or an architect, many questions come to mind when faced with the decision of where your data should be stored and how it should be managed. The following are some of these questions: What does the rise of all these NoSQL technologies mean to my enterprise? What is NoSQL to begin with? Does it mean "No SQL"? Could this be just another fad? Is it a good idea to bet the future of my enterprise on these new exotic technologies and simply abandon proven mature Relational DataBase Management Systems (RDBMS)? How scalable is scalable? Assuming that I am sold, how do I choose the one that fit my needs best? Is there a middle ground somewhere? What is this Polyglot Persistence I hear about? The answers to these questions and many more is the subject of this talk along with a survey of the most popular of NoSQL technologies. Be there or be square.
This document provides an overview of a Drupal training covering various topics from September 12-20, 2014. The training will introduce participants to core Drupal concepts and components including nodes, content types, taxonomies, views, panels, modules, themes, and the database layer. It will cover setting up a development environment, installing Drupal, configuring the system, and extending Drupal through custom modules and themes. Participants will learn how Drupal handles user requests and its event-driven hook system. The document also provides contact information for the trainer.
This document summarizes a presentation on using SQL Server Integration Services (SSIS) with HDInsight. It introduces Tillmann Eitelberg and Oliver Engels, who are experts on SSIS and HDInsight. The agenda covers traditional ETL processes, challenges of big data, useful Apache Hadoop components for ETL, clarifying statements about Hadoop and ETL, using Hadoop in the ETL process, how SSIS is more than just an ETL tool, tools for working with HDInsight, getting started with Azure HDInsight, and using SSIS to load and transform data on HDInsight clusters.
If You Have The Content, Then Apache Has The Technology!gagravarr
This document provides a summary of 46 Apache content-related projects and 8 content-related incubating projects. It discusses projects for transforming and reading content like Apache PDFBox, POI, and Tika. Projects for text and language analysis like UIMA, OpenNLP, and Mahout. Projects that work with structured data and linked data like Any23, Stanbol, and Jena. Projects for data management and processing on Hadoop like MRQL, DataFu, and Falcon. Projects for serving content like HTTPD Server, TrafficServer, and Tomcat. Projects that focus on generating content like OpenOffice, Forrest, and Abdera. And projects for working with hosted content like Chemistry and ManifoldCF. The document
This document discusses semantic annotation using custom vocabularies. It introduces Gabriel Dragomir and provides background on semantic web and linked data. It then describes Apache Stanbol, a framework for semantic annotation of documents. Stanbol allows modular processing of documents using configurable workflows and vocabularies. The document outlines Stanbol's architecture and components. It also discusses integrating Stanbol with Drupal for semantic indexing and annotation of content. A demo is proposed to index Drupal data in Stanbol and annotate entities using DBPedia and a custom semantic web vocabulary.
A talk given by Ted Dunning on February 2013 on Apache Drill, an open-source community-driven project to provide easy, dependable, fast and flexible ad hoc query capabilities.
Markup languages and warp-speed documentationLois Patterson
The presentation discusses how software development has moved towards more frequent releases through DevOps practices. This requires documentation to also be updated quickly. Markup languages can help by allowing many contributors to collaborate easily on documentation. Specific markup languages mentioned include reStructuredText and Markdown, which can be processed by tools like Sphinx to generate documentation from plain text files. The presentation demonstrates how to use reStructuredText and emphasizes that markup languages, collaborative tools like GitHub, and automation are key to supporting modern rapid software development practices.
Lois Patterson: Markup Languages and Warp-Speed DocumentationJack Molisani
The presentation discusses how software development has moved towards more frequent releases through DevOps practices. This requires documentation to also be updated quickly. Markup languages can help by allowing many contributors to collaborate easily on documentation. Specific markup languages mentioned include reStructuredText and Markdown, which can be processed by tools like Sphinx to generate documentation from plain text files. The presentation demonstrates how to use reStructuredText and emphasizes that markup languages, collaborative tools like GitHub, and automation are key to supporting modern rapid software development practices.
This presentation demonstrates how QueryPath can be used within Drupal to integrate web services and create rich mash-ups.
The "official" DrupalCon Paris video of this presentation can be found here: http://technosophos.com/content/querypath-mashups-and-web-services-video
The document discusses various topics related to web development including Java principles, Spring frameworks, PHP, high-load web applications, mobile backend as a service (mBaas), web frameworks, Java web development frameworks like JSF and GWT, rendering on the server-side vs client-side, distribution of work between designers and developers, web browsers and their support for HTML5 and CSS3, programming languages, GUI frameworks, AngularJS, testing tools like JUnit, and build tools like Maven, Ant, and Ivy.
Integrating Apache Pulsar with Big Data EcosystemStreamNative
In Apache Pulsar Beijing Meetup, Yijieshen gave a presentation of the current state of Apache Pulsar integrating with Big Data Ecosystem. He explains why and how Pulsar fits into current big data computing and query engines, and how Pulsar integrates with Spark, Flink and Presto for unified data processing system.
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
1) StumbleUpon uses open source tools like Kafka, HBase, Hive and Pig to build a scalable big data infrastructure to process large amounts of data from its services in real-time and batch.
2) Data is collected from various services using Kafka and stored in HBase for real-time analytics. Batch processing is done using Pig and data is loaded into Hive for ad-hoc querying.
3) The infrastructure powers various applications like recommendations, ads and business intelligence dashboards.
1. Apache Spark is an open source cluster computing framework for large-scale data processing. It is compatible with Hadoop and provides APIs for SQL, streaming, machine learning, and graph processing.
2. Over 3000 companies use Spark, including Microsoft, Uber, Pinterest, and Amazon. It can run on standalone clusters, EC2, YARN, and Mesos.
3. Spark SQL, Streaming, and MLlib allow for SQL queries, streaming analytics, and machine learning at scale using Spark's APIs which are inspired by Python/R data frames and scikit-learn.
Apache Arrow -- Cross-language development platform for in-memory dataWes McKinney
Wes McKinney is the creator of Python's pandas project and a primary developer of Apache Arrow, Apache Parquet, and other open-source projects. Apache Arrow is an open-source cross-language development platform for in-memory analytics that aims to improve data science tools. It provides a shared standard for memory interoperability and computation across languages through its columnar memory format and libraries. Apache Arrow has growing adoption in data science systems and is working to expand language support and computational capabilities.
Introduction to Hadoop Ecosystem was presented to Lansing Java User Group on 2/17/2015 by Vijay Mandava and Lan Jiang. The demo was built on top of HDP 2.2 and AWS cloud.
This document summarizes DreamObjects, an object storage platform powered by Ceph. It discusses the hardware used in storage and support nodes, including Intel and AMD processors, RAM, disks, and networking components. The document also provides details on Ceph configuration including replication, CRUSH mapping, OSD configuration, and application tuning. Monitoring tools discussed include Chef, pdsh, Sensu, collectd, graphite, logstash, Jenkins and future plans.
High Voltage - Building Static Sites With Wordpress-Managed ContentNicolle Morton
WordPress evolved from a simple blog platform into a full-fledged content management system. It is now evolving beyond that into an application development framework. It is a new era for WordPress. One that partly made possible by the WP-API plugin. The plugin bolts a REST API on top of the WordPress platform, allowing for integration of WordPress with other systems.
WP-API can be leveraged in many ways. For example, there is a lot of excitement around using WordPress as a backend for single page web apps and mobile apps. But the possibilities don’t end there. In this talk, we will explore the use of WP-API to integrate WordPress-managed content with static site generators.
Static site generators and flat-file CMSs have been growing in popularity over the past few years, due largely to developer productivity, reliability, security, performance and ease-of-deployment. They are a compelling alternative but compromises must be made to realize the benefits. It doesn’t have to be an either-or decision. We will explore strategies for using WordPress as a collaborative writing room – similar to proprietary alternatives like Prismic.io and Contentful. And we will explore strategies for building static sites using that content.
Presented at WordCamp Hamilton 2015 - By Nick Kenyeres - Director of Technology at Wise & Hammer Inc.
With the public confession of Facebook, HBase is on everyone's lips when it comes to the discussion around the new "NoSQL" area of databases. In this talk, Lars will introduce and present a comprehensive overview of HBase. This includes the history of HBase, the underlying architecture, available interfaces, and integration with Hadoop.
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
This talk is about showing the complexity in building a data pipeline in Hadoop, starting with the technology aspect, and the correlating to the skillsets of current Hadoop adopters.
Solr + Hadoop: Interactive Search for Hadoopgregchanan
This document discusses Cloudera Search, which integrates Apache Solr with Cloudera's distribution of Apache Hadoop (CDH) to provide interactive search capabilities. It describes the architecture of Cloudera Search, including components like Solr, SolrCloud, and Morphlines for extraction and transformation. Methods for indexing data in real-time using Flume or batch using MapReduce are presented. The document also covers querying, security features like Kerberos authentication and collection-level authorization using Sentry, and concludes by describing how to obtain Cloudera Search.
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleDomino Data Lab
Scala and Spark are each great tools for data processing and they work well together. They can process data via small simple interactive queries as well as in very large highly-available and scalable production systems. They provide an integrated framework for an ever growing wide range of data processing capabilities. We examine the reasons for this and also look a couple of simple data processing examples written in Scala. Presented by John Nestor, Sr Architect at 47 Degrees.
In this session we will have a look at the different Caching options in Lucee and introduce a new tool called ArgusCache, which will allow you to tune your applications, WITHOUT touching the source code.
Turning XML to XLS on the JVM, without loosing your Sanity, with Groovygagravarr
This document discusses using the Groovy programming language to transform XML data into an Excel spreadsheet. It provides an overview of Groovy and how it can be used to easily parse and process XML. It then demonstrates using Groovy with the Apache POI library to generate an Excel file from the XML data. The document shows sample Groovy code for parsing the XML, extracting the necessary data, and writing rows and cells to create the Excel spreadsheet. It concludes by noting that while more code was needed for the full requirements, Groovy allowed efficiently processing the XML and generating the Excel output.
But we're already open source! Why would I want to bring my code to Apache?gagravarr
From ApacheCon Europe 2015 in Budapest
So, your business has already opened sourced some of its code? Great! Or you're thinking about it? That's fine! But now, someone's asking you about giving it to these Apache people? What's up with that, and why isn't just being open source enough?
In this talk, we'll look at several real world examples of where companies have chosen to contribute their existing open source code to the Apache Software Foundation. We'll see the advantages they got from it, the problems they faced along the way, why they did it, and how it helped their business. We'll also look briefly at where it may not be the right fit.
Wondering about how to take your business's open source involvement to the next level, and if contributing to projects at the Apache Software Foundation will deliver RoI, then this is the talk for you!
A presentation from ApacheCon Europe 2015 / Apache Big Data Europe 2015
Apache Tika detects and extracts metadata and text from a huge range of file formats and types. From Search to Big Data, single file to internet scale, if you've got files, Tika can help you get out useful information!
Apache Tika has been around for nearly 10 years now, and in that time, a lot has changed. Not only has the number of formats supported gone up and up, but the ways of using Tika have expanded, and some of the philosophies on the best way to handle things have altered with experience. Tika has gained support for a wide range of programming languages to, and more recently, Big-Data scale support, and ways to automatically compare effects of changes to the library.
Whether you're an old-hand with Tika looking to know what's hot or different, or someone new looking to learn more about the power of Tika, this talk will have something in it for you!
What's with the 1s and 0s? Making sense of binary data at scale - Berlin Buzz...gagravarr
If you have one or two files, you can take the time to manually work out what they are, what they contain, and how to get the useful bits out (probably....). However, this approach really doesn't scale, mechanical turks or no! Luckily, there are open source projects and libraries out there which can help, and which can scale!
In this talk, we'll first look at how we can work out what a given blob of 1s and 0s actually is, be it textual or binary. We'll then see how to extract common metadata from it, along with text, embedded resources, images, and maybe even the kitchen sink! We'll see how to use things like Apache Tika to do this, along with some other libraries to complement it. Once that part's all sorted, we'll look at how to roll this all out for a large-scale Search or Big Data setup, helping you turn those 1s and 0s into useful content at scale!
This talk was given at Berlin Buzzwords 2015
The ""Apache Way"" is the process by which Apache Software Foundation projects are managed. It has evolved over many years and has produced over 100 highly successful open source projects. But what is it and how does it work?
The other Apache Technologies your Big Data solution needsgagravarr
The document discusses many Apache projects relevant to big data solutions, including projects for loading and querying data like Pig and Gora, building MapReduce jobs like Avro and Thrift, cloud computing with LibCloud and DeltaCloud, and extracting information from unstructured data with Tika, UIMA, OpenNLP, and cTakes. It also mentions utility projects like Chemistry, JMeter, Commons, and ManifoldCF.
How Big is Big – Tall, Grande, Venti Data?gagravarr
Apache has a wide range of Big Data projects, some suitable for smaller problem sets, some which scale to huge problems. Today though, that one label "Big Data" can cause confusion for new users, as they may struggle to pick the right project for the right scale for their problem.
Do we need new titles for different kinds of Big Data? Does the buzz and VC funding cause confusion? Is the humble requirement dead? Or can we help new users better find the right Apache project for them?
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?gagravarr
So, your business has already opened sourced some of it's code? Great! But now, someone's asking you about giving it to these Apache people? What's up with that, and why isn't just being open source enough?
In this talk, we'll look at several real world examples of where companies have chosen to contribute their existing open source code to the Apache Software Foundation. We'll see the advantages they got from it, the problems they faced along the way, why they did it, and how it helped their business. We'll also look briefly at where it may not be the right fit.
Wondering about how to take your business's open source involvement to the next level, and if contributing to projects at the Apache Software Foundation will deliver RoI, then this is the talk for you!
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...gagravarr
This document provides an overview of Apache Tika, an open source toolkit for detecting and extracting metadata and structured text from various file formats. It discusses Tika's capabilities for detecting file types using filename extensions, magic bytes, and parsing file containers. It also describes how Tika extracts metadata, plain text, and XHTML from files and supports detecting text encodings and languages. The document outlines different ways to extend and customize Tika, as well as various options for integrating and running Tika programs.
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...gagravarr
If you have one or two files, you can take the time to manually work out what they are, what they contain, and how to get the useful bits out (probably....). However, this approach really doesn't scale, mechanical turks or no! Luckily, there are Apache projects out there which can help!
In this talk, we'll first look at how we can work out what a given blob of 1s and 0s actually is, be it textual or binary. We'll then see how to extract common metadata from it, along with text, embedded resources, images, and maybe even the kitchen sink! We'll see how to do all of this with Apache Tika, and how to dive down to the underlying libraries (including its Apache friends like POI and PDFBox) for specialist cases. Finally, we'll look a little bit about how to roll this all out on a Big Data or Large-Search case.
From the Fast Feather Track at ApacheCon NA 2010 in Atlanta
This quick talk provides an overview of Apache Tika, looks at a new features and supported file formats. It then shows how to create a new parser, and finishes with using Tika from your own application.
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies
In late April 2025, a significant portion of Europe, particularly Spain, Portugal, and parts of southern France, experienced widespread, rolling power outages that continue to affect millions of residents, businesses, and infrastructure systems.
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.
Impelsys provided a robust testing solution, leveraging a risk-based and requirement-mapped approach to validate ICU Connect and CritiXpert. A well-defined test suite was developed to assess data communication, clinical data collection, transformation, and visualization across integrated devices.
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
6. What can we get in 50 mins?
• A quick overview of each project
• When talks on the project are
happening
• When meetups on the project are
happening
• Anything new/exciting about the
project?
• What interests me in the project!
8. Apache HTTPD Server
http://httpd.apache.org/
• Talks – All day Wednesday
Meetup – Thursday evening
• Very wide range of features
• (Fairly) easy to extend
• Can host most programming
languages
• Can front most content systems
• Can proxy your content applications
• Can host code and content
9. Apache TrafficServer
http://trafficserver.apache.org/
• High performance web proxy
• Forward and reverse proxy
• Ideally suited to sitting between your
content application and the internet
• For proxy-only use cases, will probably
be better than httpd
• Fewer other features though
• Often used as a cloud-edge http router
11. Tomcat – What's New
http://tomcat.apache.org/
• Memory leak detection – for your
applications, and for the JVM!
• Easier to embed – no need for large
numbers of config files!
• Asynchronous request processing for
things like Comet / Bayeux
• Servlet 3.0
• Improved JMX configurability
16. Which Apache NoSQL?
• Do you have tuples, documents,
variable key/values or complex object?
• Must data always be consistent?
• If you loose a chunk of machines
(partition), should read/write still work?
• Query by id, range, arbitrary key/value
or map-reduce function?
• How much human interaction is
required to add or remove nodes?
17. Apache DB: Derby
http://db.apache.org/derby/
• Small, easy to embed SQL database
• Can be embedded and accessed via
an embedded JDBC driver
• Can be accessed over the network
• Can be run entirely in-memory
• Efficient on-disk format
• Has a JavaME version – run it on
basic cell phones!
20. Apache Lucene
http://lucene.apache.org/
• All day Friday + Meetup Tuesday night
• Inverted index store
• (Each term lists it documents, rather
than each document listing terms)
• Searching is faster than adding
• Normally stores text, but additional
data can be associated with it
• Can hold indexed and un-indexed data
21. Lucene – What's New?
http://lucene.apache.org/
• Lucene and SOLR have merged
• Near real-time support when indexing
• Better storing of attributes and other
data in the token stream
• Numeric fields improved – no need to
externally process numbers into range
buckets yourself
• Fast vector highlighter for large docs
26. Apache POI
http://poi.apache.org/
• 3pm Wednesday + FastFeatherTrack
• File format reader and writer for
Microsoft office file formats
• Support binary & ooxml formats
• Strong read edit write for .xls & .xlsx
• Read and basic edit for .doc & .docx
• Read and basic edit for .ppt & .pptx
• Read for Visio, Publisher, Outlook
27. Apache Tika
http://tika.apache.org/
• 9am Friday + Fast Feather Track
• Java (+ command line) toolkit for
detecting and extracting content
• Identifies what a blob of content is
• Gives you consistent metadata back
for it
• Parses the contents into plain text,
HTML, XHTML or sax events
28. Tika – What's New?
http://tika.apache.org/
• Lots of new parsers – text, office
formats, publishing formats, images,
audio, CAD, fonts etc
• Long standing parsers improved –
better HTML from word for example
• Embedded resources and containers
• Use expanding – used by many SOLR
users, Alfresco, lots of people
crunching masses of data on Hadoop
29. Apache Cocoon
http://cocoon.apache.org/
• Component Pipeline framework
• Plug together “Lego-Like” generators,
transformers and serialisers
• Generate your content once in your
application, serve to different formats
• Read in formats, translate and publish
• Can power your own “Yahoo Pipes”
• Modular, powerful and easy
31. Apache XML Graphics: Batik
http://xmlgraphics.apache.org/#batik
• Java SVG toolkit + library
• SVG Parser – read and process
existing SVG files
• SVG Generator – Graphics2D
implementation that outputs SVG
• SVG Dom – easy way to manipulate
your SVG files
• SVG viewer program (Squiggle)
• Command line SVG rasteriser
32. Apache XML Graphics: FOP
http://xmlgraphics.apache.org/#fop
• XSL-FO processor in Java
• Reads W3C XSL-FO, applies the
formatting rules to your XML
document, and renders it
• Output to Text, PS, PDF, SVG, RTF,
Java Graphics2D etc
• Lets you leave your XML clean, and
define semantically meaningful rich
rendering rules for it
35. Apache Commons: Sanselan
http://commons.apache.org/sanselan/
• Commons Track – Thursday Morning
• Pure Java image reader and writer
• Fast parsing of image metadata and
information (size, color space, icc etc)
• Much easier to use than ImageIO
• Slower though, as pure Java
• Wider range of formats supported
• PNG, GIF, TIFF, JPEG + Exif, BMP,
ICO, PNM, PPM, PSD, XMP
37. Apache Forrest
http://forrest.apache.org/
• Document rendering solution build on
top of cocoon
• Reads in content in a variety of
formats (xml, wiki etc), applies the
appropriate formatting rules, then
outputs to different formats
• Heavily used for documentation and
websites
• eg read in a file, format as changelog
and readme, output as html + pdf
38. Apache Abdera
http://abdera.apache.org/
• Atom – syndication and publishing
• High performance Java
implementation of RFC 4287 + 5023
• Generate Atom feeds from Java or by
converting
• Parse and process Atom feeds
• Atompub server and clients
• Supports Atom extensions like
GeoRSS, MediaRSS & OpenSearch
41. Apache ManifoldCF (Incubating)
http://incubator.apache.org/connectors/
• Name has changed a few times...
(Lucene/Apache Connectors)
• Provides a standard way to get content
out of other systems, ready for sending
to Lucene etc
• Different goals to CMIS (Chemistry)
• Uses many parsers and libraries to talk
to the different repositories / systems
• Analogous to Tika but for repos
45. Chemistry vs ManifoldCF
incubator /chemistry/ /connectors/
• ManifoldCF treats repo as nasty black
box, and handles talking to the parsers
• Chemistry talks / exposes repo's
contents through CMIS
• ManifoldCF supports a wider range of
repositories
• Chemistry supports read and write
• Chemistry delivers a richer model
• ManifoldCF great for getting text out
46. Apache Lenya
http://lenya.apache.org/
• 9am Thursday
• XML Content Management system
• Powered by Apache Cocoon
• WSIWYG editors onto Relax-NG XML
• Rich workflow engine + staging
• Clean URLs, CSS for styling
• Sensible handling of metadata, assets,
internal links, users, permissions etc
47. Apache Roller
http://roller.apache.org/
• Multi-user blog server
• Used by the ASF internally
• Scales to thousands of users & blogs
• Should work with any JavaEE servlet
container and SQL database
• Comment moderation and spam filters
• Each author has full layout control
• Indexes, feeds and Metaweblog API
support for 3rd
party clients
48. Apache Shindig
http://shindig.apache.org/
• Open Social Application Container
• Hosts your open social widgets
• Renders OpenSocial applications into
HTML + JavaScript
• Stores the data for your application
• Full client-side JavaScript libraries to
deliver gadget functionality
• Reference implementation
51. Apache Sling
http://sling.apache.org/
• 12pm Wednesday
• “Fun” and easy web framework
• REST based
• Backed by Jackrabbit content repo
• Powered by OSGi
• Easy to script, supports multiple output
languages (JSP, server side javascript,
scala etc)
• Stores both templates and content
52. Apache Tapestry
http://tapestry.apache.org/
• Object Orientated web applications
• Build your application in terms of
objects, methods and properties
• Tapestry handles URLs, query
parameters and state for you
• Pages built with simple HTML
• Concentrate on the content that backs
each part, and the business logic for it
• Tapestry glues it together for you
53. Apache Tiles
http://tiles.apache.org/
• Templating framework for Java
• Works well with Struts and Shale
• Lets you build your page from lots of
tiles (components), which can nest
• Build tiles together to make templates
• Clean separation between your
content, the business logic to select it,
and the rendering rules
54. Apache Velocity
http://velocity.apache.org/
• Templating engine
• MVC webapp or standalone
• Can generate HTML, SQL, PostScript,
XML, Java Code or email from
templates
• Anakia lets you make a xdoc file
available to a velocity template, handy
when generating HTML from xdoc
• Fairly rich templating language
55. Apache Wicket
http://wicket.apache.org/
• Build your web applications in Java
• Uses Java in preference to JavaScript,
CSS etc
• Handy if you have a strong Java team
and you need to do some web stuff
• Fits well with your Java components
• But JS / CSS front end devs tend to be
cheaper than Java ones....
56. Apache Clerezza (Incubating)
http://incubator.apache.org/clerezza/
• OSGi based modular semantic web
application framework
• Lets you build applications that fit into
the Semantic Web
• Stores and easily manipulates RDF
• Full control over REST and URIs
• Build applications that both consume
semantic data (eg RDF files), and that
expose content to others