The Foundation: Understanding the Problem
To find duplicate words in a sentence, we need a systematic approach for analyzing the occurrences of each word. Essentially, our task is to examine each word in the given string and keep a tab on how often it shows up.
Step 1: Tokenizing the Sentence into Words
What Does Tokenizing Mean?
Tokenization is the process of converting a sequence of text into individual "tokens" or units. In our context, this means splitting the given sentence into individual words.
How to Tokenize?
We can start by converting the entire string to lowercase to make our function case-insensitive. Then, we'll use the split
function to break it into an array of words.
1string[] split_s = s.ToLower().Split(' ');
xxxxxxxxxx
22
using System;
using System.Collections.Generic;
class Program {
public static void Main(string[] args) {
string s = "Original String";
string[] splitS = s.ToLower().Split(' ');
Dictionary<string, int> occurrences = new Dictionary<string, int>();
foreach (var word in splitS) {
if (!occurrences.ContainsKey(word)) {
occurrences[word] = 1;
} else {
occurrences[word]++;
}
}
foreach (KeyValuePair<string, int> entry in occurrences) {
Console.WriteLine(entry.Key + ": " + entry.Value);
}
}
}
OUTPUT
Results will appear here.