AlgoDaily - Advancing your Search Engine

Home > Build Datastores From Scratch > Build Datastores From Scratch > Advancing your Search Engine

Developing our search engine further, we now dive into the core component of any search engine - the relevance algorithm. This algorithm elegantly sorts the search results based on a score of relevance, matching the query intentions as closely as possible.

Just like in finance where portfolio managers rank their investments based on certain criteria, our Relevance Algorithm will rank documents based on their alignment with the search query.

This algorithm is no different from how AI models compute probabilities to make predictions. The search engine leverages this predictive aspect to rank and present most relevant documents on top of the search results.

To illustrate, we'll create a simplified and primitive version of this relevance algorithm, considering only frequency of the query terms in the documents.

In the Python code example below, for each query term, we calculate its frequency in each document and add these frequencies up to derive a 'relevance score'. Higher the score, the more relevant the document is to the search query.

However, keep in mind that this is a basic relevance algorithm. Real-world search engines use vastly more sophisticated algorithms considering factors like user behavior, language nuances, document inter-connectedness and much more.

xxxxxxxxxx
 
if __name__ == "__main__":
  inverted_index = { 'doc1': {'term1':2, 'term2':3}, 'doc2': {'term1':1, 'term2':2}, 'doc3': {'term1':5}}
  query_terms = ['term1', 'term2']
  
  relevance_scores = {}
  
  for doc, terms in inverted_index.items():
    relevance_scores[doc] = sum(terms.get(query_term, 0) for query_term in query_terms)
​
  sorted_docs = sorted(relevance_scores, key=relevance_scores.get, reverse=True)
​
  for doc in sorted_docs:
    print(f'{doc}: {relevance_scores[doc]}')

Programming Categories

Popular Lessons