Search indexing in AEM
Search indexing in AEM
Overview
An index is a data structure used to quickly locate and access data within Adobe Experience
Manager (AEM). It primarily utilizes JCR queries to search and retrieve content, enhancing
query performance and reducing the overhead on AEM. Indexing significantly improves the
speed of data retrieval operations. However, in scenarios where query frequency is low and
the amount of content to be searched is minimal, indexing may be considered unnecessary.
Indexing Mechanisms
Oak enables the indexing of content stored in the repository. It supports Lucene-based
indexes, which can handle both property and full-text constraints. When multiple indexes
are available for a query, each indexer evaluates the cost of execution.
Synchronous Indexing:
Asynchronous Indexing:
• Oak schedules periodic updates to the index material based on detected changes in
repository content.
• Provides improved performance by not tying index updates directly to commit
operations.
Note: Oak internally uses Apache Lucene for indexing repository content. Update and delete
operations in the repository trigger Lucene reindexing.
Index Creation
Note: Compared to regular Property Indexing, Lucene Property Indexes are always
configured in asynchronous mode. Consequently, the results may not always reflect the
most up-to-date state of the repository.
Query Performance Tool:
• Located at
http://localhost:4502/libs/granite/operations/content/diagnosistools/queryPerform
ance.html
• The Indexes Used when executing the query (or no index if the query would be
executed using Repository Traversal).
• The execution time (if Include Execution Time checkbox was checked) and count of
results read (if Read first page of results or Include Node Count checkboxes were
checked).
• The execution plan, allowing detailed analysis of how the query is executed - see
Reading the Query Execution Plan for how to interpret this.
• The paths of the first 20 query results (if Read first page of results checkbox was
checked)
• The full logs of the query planning, showing the relative costs of the indexes which
were considered for the execution of this query (the index with the lowest cost will be
the one chosen).
* Solr Full-Text Indexing:
The Solr index is designed for full-text search but can also be used for indexing by path,
property restrictions, and primary type restrictions. The Solr index within Oak supports all
types of JCR queries. Integration with AEM occurs at the repository level, making Solr one of
the indexing options available in Oak, the new repository implementation shipped with
AEM.
By integrating AEM with Apache Solr, organizations can leverage Solr's advanced search
capabilities to enhance the search experience within their AEM-powered digital
properties. This integration allows for more efficient content discovery, targeted search
results, and improved user engagement.
The first step is to download and install the Apache Solr software on a dedicated server or within your
AEM infrastructure. Ensure that the Solr version is compatible with the AEM version you are using.
URL: https://archive.apache.org/dist/lucene/solr/
Unzip the zip file. After unzipping file structure of solr, go to the bin folder and run
the below command to start the Solr server.
Once the server starts, you will see the message in the command prompt. Solr
is started on 8983 port.
➔ http://localhost:8983/solr
Now go to solr web client and create a collection: This collection is for a single
search index. Follow the below steps to create an index in solr.
From the web client, select Collection and Click Add Collection
Provide the name of the collection "collection". Choose the config list from
dropdown. I am using getting started. You can provide a shard number
according to your requirement. I am keeping it 1 at this moment.
Integrate Solr with your AEM instance by configuring the necessary connections, settings, and security
parameters. This includes specifying the Solr server URL, indexing strategy.
➔ Create a component and write an ajax call where we will integrate Solr with aem by using Solr
server url in component HTML file.
➔ Drag and drop the component on page.
Tailor the Solr schema to match the content and metadata structure of your AEM implementation.
This allows you to optimize the indexing and retrieval of relevant data for your specific use case.
Importance of Indexing Over Traversing:
Indexing and traversing are two different approaches to accessing data, and each has its own
benefits. Here's a breakdown of why indexing is often preferred over traversing, especially in
the context of search systems.
1.Performance Efficiency:
Indexing allows for rapid retrieval of specific content based on pre-built
structures, resulting in significantly faster response times compared to traversing
the entire content tree.
2.Reduced Resource Consumption:
Traversing involves scanning through the entire content structure, leading
to higher resource consumption. Indexing minimizes the need for exhaustive
traversal, optimizing resource usage.
3.Scalability:
Indexing ensures that the system can scale efficiently, even as the content
repository grows, maintaining performance levels by avoiding the computational
overhead of extensive traversals.
4.Real-Time Updates:
Indexing provides real-time updates to reflect changes in the content. This
ensures that search results remain accurate and up-to-date, addressing the
dynamic nature of content in AEM.
Conclusion: