AlgoDaily - Find Duplicate Words

The Foundation: Understanding the Problem

To find duplicate words in a sentence, we need a systematic approach for analyzing the occurrences of each word. Essentially, our task is to examine each word in the given string and keep a tab on how often it shows up.

Step 1: Tokenizing the Sentence into Words

What Does Tokenizing Mean?

Tokenization is the process of converting a sequence of text into individual "tokens" or units. In our context, this means splitting the given sentence into individual words.

How to Tokenize?

We can start by converting the entire string to lowercase to make our function case-insensitive. Then, we'll use the split function to break it into an array of words.

1string[] split_s = s.ToLower().Split(' ');

xxxxxxxxxx
 
using System;
using System.Collections.Generic;
​
class Program {
  public static void Main(string[] args) {
    string s = "Original String";
    string[] splitS = s.ToLower().Split(' ');
    Dictionary<string, int> occurrences = new Dictionary<string, int>();
​
    foreach (var word in splitS) {
      if (!occurrences.ContainsKey(word)) {
        occurrences[word] = 1;
      } else {
        occurrences[word]++;
      }
    }
​
    foreach (KeyValuePair<string, int> entry in occurrences) {
      Console.WriteLine(entry.Key + ": " + entry.Value);
    }
  }
}

OUTPUT

Results will appear here.