Azure Search is an Azure service that was launched in Feb 2015; since then it’s had many features added to it and the dev team are still busy growing it.
In a nutshell, Azure Search is “Search as a service” – a simple API for developers to push data into (for indexing) and run searches over to pull back results.
Why not use a SQL Azure instance for this?
SQL Azure’s a relation database, and is great for managing entities with relationships, and accessing the data using SQL queries. That’s fine, but if you want to populate it with 15 million records and run some pretty complex search terms over it, you’re going to be making it do things it’s not really best at. This is when Azure Search comes in. It’s built on top of Lucene and Elasticsearch.
Your index data (“Documents”) are stored and indexed in a way that’s fit for search engines to consume, it’s massively scalable for size and performance, and can carry out some really neat search functionality when you call it (you don’t need to prepare your fields and indexed in advance to affect things like how the input is analysed or matched, or how the results are weighted and ranked).
As an example, if you have a product catalog of bikes for example, and your user searches for the term “Mountan”, Azure Search can realise that this is a spelling mistake, and the user probably meant “Mountain”, and carry out the search with the correct term (like the big internet search engines do).
- Facets (these are like properties of a document) – sticking with the bike example, you might have facets for Size, Colour, Price (range), Year, Type (Mountain, Road, CX, hybrid,…) – allowing the user to filter on one or more facets to narrow the search results down
- Indexers – an indexer is a service that runs in Azure on a schedule, and looks at your data sources on a regular basis, pulls updates from them and pushes them to the Azure Search Index. You can setup indexers currently for SQL Azure, DocumentDB,, SQL IaaS (SQL on a VM in Azure or on premises), and more are in the pipeline
- Pagination – simply, retrieving results in pages rather than all in one go and then you having to do the hard work.
- Suggesters – these are special features of an index that support the “autocomplete” type functionality of a text based search – again, the suggesters can support spelling mistake corrections, hit highlighting on the page and so on.
- Scoring profiles – you can setup multiple profiles which place a weighting on certain fields within your index. For example, if you know the search type youre offering your user should present them with more recently added data first, then you can weight based on “Freshness”. Other factors can be on tag matches, specific field matches (even fields you don’t want to retrieve to show your users), or magnitude (how big/small a matched number or date is), and distance…
- oh yes, Azure Search supports Geospatial indexing and search as well, so if you can load Lat, Lon points into your results, you can search either a radius around the point, or search for any matches within a polygon you define
- Language analysers – these are plugins that use Microsoft’s Natural Language Processors to carry out the clever normalization, filtering, word-breaking, stemming and trimming of the search terms. For example, if you put in the phrase “Dog falling into pool”, the analyser will look at this and search for terms around “fall”, “falls”, “fell” as well as falling. Language Analysers on MSDN.
How do I use it?
I’m not going to re-write the wheel, instead I will point you at a great Getting Started article (here)which will take you through the process with some sample data, and uses Fiddler to do the REST API calls for populating and searching.
From a coding point of view, you have the option of the Search REST API, or the Search .net SDK (which I prefer, because it works well and saves a fair bit of time).
From a process point of view –
- Setup your Azure search service (Free service plan gives you 50MB of storage and 10,000 documents across your indexes, standard jumps up to 15 million documents and 25GB).
- Create an index
- Either write some code to populate the index from your data sources, or if they’re SQL Azure, DocumentDB or SQL in IaaS, (and you want to do it this way), use the built in Indexers on a schedule – you can set all this up from the Portal.
- Write your search code and integrate with your application
What else should I know?
The data you push to be indexed is copied into the Azure Search Lucene database, which is not contained in your own storage account – it’s within a Microsoft managed account. Hence, for recoverability, you would need to raise a support ticket, there is no self-service option as with Azure SQL databases. The storage is still Azure’s and hence is subject to the same levels of redundancy and resilience you’d expect.
There isn’t (yet) any support for nested documents, eg. one field within your index that’s a mini JSON document, but for most use cases this isn’t usually the best way to design your indexes.
You can store arrays of strings within a field (such as tags) however, and you can create new fields without having to recreate the index from scratch, so if new metadata is added to your document sources, you can add it in the Azure Search index without any problem.
If you want to push textual documents into a field, you should process them first (ie. if they’re html, remove the tags and convert to plain text). The ability to index non-plain-text document types is not (yet) there.
Analytics – the log data’s not exposed (yet) to customers, so if you want to do your own analytics on search terms, throughput, index response, etc, then you need to D-I-Y for now.
Great – is that it?
Almost – a couple of other things that might be of interest:
The Search team are working closely with the Media Services team. There are projects around in the public domain that do some really amazing indexing – for example, processing a video files as part of the indexing push, extracting text from the spoken words in the video clip, indexing all that, and then allowing users to search for a phrase, and when the search result is displayed along with the matching extract of transcript containing the search word, they can click on the video clip and it’ll play the video from just before that point… take a look here http://fermion-test.azurewebsites.net/ and try searching for something like “customer”.
Also a colleague of mine, Chris Stone has written a very nice API exerciser you can find at https://azsearch.azurewebsites.net
As well as some excellent search samples here:
As with all Azure services, they’re growing and changing at a very fast pace – Search has some great features coming in the next few months, so keep your eye on the #azuresearch keyword in twitter.