AlgoDaily - Introduction to Datastores

Home > Build Datastores From Scratch > Build Datastores From Scratch > Introduction to Datastores

Understanding Basic Data Structures

Datastores rely heavily on various basic data structures. Think of these structures as the building blocks for creating and managing databases and other forms of data storage

For instance, consider relational databases like Oracle, MySQL, MSSQL, SQLite, or PostgreSQL. These databases are fundamentally structured around tables, a type of data structure. Similarly, NoSQL databases, often used in machine learning or AI applications, revolve around key-value pairs, another fundamental data structure.

In Python, lists and dictionaries are widely used data structures. A list, as you might already know, is a simple collection of items. Here is an example of a list that contains the names of different databases:

PYTHON

1myList = ['Oracle', 'MySQL', 'MSSQL', 'SQLite', 'PostgreSQL']

A dictionary, on the other hand, stores data as key-value pairs. It resembles the structure of a NoSQL database. Here's an example:

PYTHON

1myDictionary = {'key1':'value1', 'key2':'value2', 'key3':'value3'}

Understanding these basic data structures is crucial when it comes to building datastores from scratch, as it's these structures that form the foundation of how data is stored, accessed, and manipulated.

xxxxxxxxxx
 
if __name__ == "__main__":
    # Python logic here
    myList = ['Oracle', 'MySQL', 'MSSQL', 'SQLite', 'PostgreSQL']
    myDictionary = {'key1':'value1', 'key2':'value2', 'key3':'value3'}
    for i in myList:
        print('Database: ', i)
    for key, value in myDictionary.items():
        print('Key: ', key, ' Value: ', value)
    print('Understanding Basic Data Structures')

Let's test your knowledge. Fill in the missing part by typing it in.

In Python, a data structure that stores data as key-value pairs is called a ___.

Write the missing line below.

Getting to Know PostgreSQL and Its Primitives

PostgreSQL is a robust, open-source relational database system. It uses and extends SQL language combined with many features that safely store and scale complicated data workloads, making it popular in various industries like finance and AI.

One of PostgreSQL's primary constructs is 'Tables.' Tables in PostgreSQL, similar to other Relational Database Management Systems (RDBMS), are a collection of related data held in a structured format within a database. They consist of columns and rows, similar to Python's lists and dictionaries. Data can be accessed, managed, and manipulated using SQL commands.

PostgreSQL also has a concept of 'Views.' They are pseudo-tables, meaning that they are not actual tables. Instead, they are essentially the result-set of a SELECT statement. A view can contain all rows of a table or specific rows based on specific conditions.

Let's take an example of Python code to understand how we can interact with PostgreSQL. We are going to connect to a PostgreSQL database, execute a SQL SELECT query, and fetch the results. (Please replace the host, port, database, user, and password parameters if you want to execute the code on your local system).

In the next lesson, we are going to construct a primitive version of PostgreSQL as part of the course 'Build Datastores From Scratch.' We will wrap basic data structures with utilities and continue to expand until we've achieved 'feature parity'. Thus, this knowledge of PostgreSQL and its primitives forms a crucial foundation for the coming lessons.

xxxxxxxxxx
 
if __name__ == "__main__":
    # PostgreSQL example using Python
    import psycopg2
    try:
        connection = psycopg2.connect(user="sysadmin",
                                      password="pynative@#29",
                                      host="127.0.0.1",
                                      port="5432",
                                      database="postgres_db")
        cursor = connection.cursor()
        postgreSQL_select_Query = "select * from mobile"
        cursor.execute(postgreSQL_select_Query)
        print("Selecting rows from mobile table using cursor.fetchall")
        mobile_records = cursor.fetchall() 
        print("Print each row and it's columns values")
        for row in mobile_records:
            print("Id = ", row[0], )
    except (Exception, psycopg2.Error) as error :
        print ("Error while fetching data from PostgreSQL", error)
    finally:
        #closing database connection.
        if(connection):
            cursor.close()
            connection.close()
            print("PostgreSQL connection is closed")

Build your intuition. Click the correct answer from the options.

What is a 'View' in the context of PostgreSQL?

Click the option that best answers the question.

A pseudo-table which is the result-set of a SELECT statement
A type of database trigger
A visual user-interface for managing PostgreSQL
Another name for PostgreSQL table

Understanding Redis and Its Primitives

Redis is an open-source, in-memory data structure project implementing a distributed, in-memory key-value database with optional durability. It can be used as a database, cache, message broker, and queue. It supports various types of data structures, including Strings, Lists, Sets, Sorted sets, etc.

Common use cases of Redis include caching, real-time analytics, publish/subscribe, job & queue management, and much more.

Redis works with an in-memory dataset to achieve outstanding performance. However, it has built-in persistence mechanisms to store data in the disk for data durability: it involves two methods, RDB (Redis DataBase file) and AOF (Append Only File).

Let's now dive into some Python code to interact with Redis, similar to PostgreSQL, but here, we're dealing with an in-memory key value pair datastore. We're going to connect to a Redis data store, set a key value pair, and get the value of the key. (Please ensure Redis is up and running at port 6379 on localhost in your local system).

In the coming lesson, we will build a primitive version of Redis as part of the 'Build Datastores From Scratch' course. This knowledge forms a crucial foundation for the upcoming lessons and will help us understand how such high-performing technologies operate under the hood and serve industries like finance and AI.

xxxxxxxxxx
 
if __name__ == "__main__":
    # Python logic here
    import redis
    r = redis.Redis(host='localhost', port=6379, db=0)
    r.set('finance', 'AI')
    print(r.get('finance'))

Try this exercise. Is this statement true or false?

Redis is a disk-based key-value store.

Press true if you believe the statement is correct, or false otherwise.

Exploring MongoDB and Its Primitives

MongoDB is a popular NoSQL database that operates on the principle of collections and documents contrary to rows and columns in relational databases. It is important to note that MongoDB shines in scenarios dealing with unstructured data, especially when data shapes evolve over time. This makes MongoDB an ideal candidate for big data storage, real-time analytics, mobile applications, content management systems, IoT applications, and more.

Collection in MongoDB can be perceived as a table from the relational database model whereas documents can be regarded as a tuple or row. However, a significant difference is the absence of schema enabling more flexibility in data representation.

Let's now dive into some Python code to interact with MongoDB. We're going to connect to a MongoDB data store, create a collection (table), insert a document (row), and retrieve the inserted document. (Please ensure MongoDB is up and running at port 27017 on localhost in your local system).

In the subsequent lessons, we will be developing a basic version of MongoDB as part of our 'Build Datastores From Scratch' course. This fundamental learning will aid us in understanding how such robust technologies operate and how they can serve industries like finance and AI.

xxxxxxxxxx
 
from pymongo import MongoClient
​
if __name__ == '__main__':
​
    # Python logic here
​
    # Establishing connection
    client = MongoClient('mongodb://localhost:27017/')
​
    # Creating a database
    db = client['sample_db']
​
    # Creating a collection (table)
    collection = db['person']
​
    # Inserting a document (row)
    collection.insert_one({'name': 'Alice', 'age': 25, 'interests': ['finance', 'AI']})
​
    # Fetching the document
    print(collection.find_one({'name': 'Alice'}))
​
    print('Inserted and fetched document successfully.')

Are you sure you're getting this? Fill in the missing part by typing it in.

MongoDB is an ideal candidate for big data storage, real-time analytics, mobile applications, content management systems, IoT applications, and more because it operates on the principle of collections and ____.

Write the missing line below.

Examining ElasticSearch and Its Primitives

ElasticSearch is a fully-featured, flexible, and scalable open-sourced search and analytics engine. It provides a distributed and full-text search engine with a schema-free JSON-based document structure. ElasticSearch is written in Java and can search and index document files in diverse formats. Its quick, scalable, and capable nature makes it a valuable tool in applications requiring complex search mechanisms like finance or artificial intelligence.

So what makes ElasticSearch a powerhouse of big data processing? It's the primary data structure in Elasticsearch, known as an Inverted Index. An Inverted Index is a hashmap-like data structure that directs users from a word to its location in a document. This makes text searching incredibly efficient and forms the core principle behind most modern search engines.

Now, with Python, you can interact easily with running ElasticSearch instances using the official Elasticsearch client. In the code provided, we're going to connect to a local Elasticsearch instance, index some data, and perform a search.

Note: You need to have ElasticSearch running locally for the code to execute successfully.

In the upcoming sections, we will traverse further on our journey to build a basic version of ElasticSearch in 'Build Datastores From Scratch' course. This foundation knowledge will enable us to understand how versatile search engines function and their impact on various industry sectors.

xxxxxxxxxx
 
from elasticsearch import Elasticsearch
​
if __name__ == "__main__":
    # Instantiate the Elasticsearch client
    es = Elasticsearch()
    
    # Index some data
    es.index(index='myindex', doc_type='test', id=1, body={"mykey": "mydata"})
    
    # Search for the indexed data
    results = es.search(index='myindex', body={"query": {"match_all": {}}})
    
    # Print all the search results
    print('Search results:', results)

Try this exercise. Click the correct answer from the options.

What is the primary data structure in Elasticsearch that makes text searching incredibly efficient?

Click the option that best answers the question.

Heap Map
B-Tree
Linked List
Inverted Index

Comparing Different Datastores and Their Use Cases

Understanding the functionalities, strengths, and limitations of each datastore is crucial to determine the best fitting ones for specific use-cases. With a wide array of options like PostgreSQL, Redis, MongoDB, and Elasticsearch, choosing the appropriate datastore becomes an exercise in understanding system requirements and trade-offs.

Let's highlight some usages. PostgreSQL, being an RDBMS, shines when it comes to transactional use-cases. It is a suitable option for enterprise applications demanding ACID compliance, complex queries, and integrations with applications in multiple languages.

Redis, an in-memory datastore, excels in rapid data lookups which are beneficial for caching, session storage, and real-time analytics needs. Redis' structures such as hashmaps, sets, and lists enable diverse operational possibilities.

MongoDB works best where scalability, flexibility, and speedy development are the key. Due to its document-oriented approach, it is well suited for content management, real-time analytics and has been extensively used for storing IoT data.

ElasticSearch suits use-cases that require complex, full-text search capabilities, like product searches with filters in an e-commerce application, logs and event data analysis in AI-based systems or financial tech applications, given its distributed nature.

While these datastores excel in their own domains, they also have respective limitations. For example, Redis' data is volatile, while Elasticsearch and MongoDB aren't designed for transactional data. PostgreSQL might become challenging when it comes to scaling horizontally. Therefore, understanding these features and limitations are crucial to choosing the right datastore.

In our journey, we will try to realize some of these features from scratch, thus deepening our admiration for these technologies and their ingenious design.

xxxxxxxxxx
 
if __name__ == '__main__':
  datastores = ['PostgreSQL', 'Redis', 'MongoDB', 'ElasticSearch']
  for ds in datastores:
    print(f'Studying: {ds}')
  print('Understanding the use-cases helps in making an informed decision.')

Are you sure you're getting this? Is this statement true or false?

ElasticSearch is primarily designed for transactional data.

Press true if you believe the statement is correct, or false otherwise.

Building Datastores From Scratch

Building a datastore from scratch is a bold task and an enlightening one. It's like crafting your own financial trading algorithm, you get a sense of deep understanding of the inner workings of the system.

The choice of language here matters. Considering Python, due to its simplicity and ample libraries, it can be a good choice. But remember, language is a tool and the concept remains the same even if the tool changes.

Let's consider a basic example - a Key Value Store is a well-known primitive in data storage. It is simple, handling key-value pairs. Redis, a popular datastore, uses this along with other data structures.

To build our own primitive version of this, let's start with Python's built-in dictionary data structure. You may perceive it as a simple Hashmap, but at its core, it represents the Key-Value data model.

The implementation seems deceivingly simple, but the catch lies in scaling such a system. Here is where Computer Science concepts kick in, such as Sharding, Replication, Consistent Hashing, etc. AI models could be leveraged for smart caching mechanisms in our datastore as well.

In the upcoming screens, we will delve deeper into these topics. By the end of it, you'll be more confident in understanding how datastores function beneath the surface, and you may even be inspired to start building one yourself.

xxxxxxxxxx
 
if __name__ == "__main__":
  # Python key-value datastore
  datastore = {}
  
  # Let's add some data
  datastore['Apple'] = 'AAPL'
  datastore['Microsoft'] = 'MSFT'
  datastore['Amazon'] = 'AMZN'
  
  # Let's retrieve some data
  print(datastore['Apple'])
  
  # Let's delete some data
  del datastore['Amazon']
  
  print(datastore)

Build your intuition. Fill in the missing part by typing it in.

In the process of scaling our primitive key-value datastore, one of the core Computer Science concepts that comes into play is _.

Write the missing line below.

Concluding Remarks: From Learning to Development

As this lesson on datastores comes to an end and you prepare to embark on your proprietary datastore development journey, wisdom comes with reflecting upon the concepts learned so far. From understanding basic data structures, taking a deep dive into PostgreSQL, Redis, MongoDB, and ElasticSearch primitives, comparing different datastores, and finally discussing building datastores from scratch, we've traversed a comprehensive path.

This journey has not just been about understanding how to build a datastore from scratch, but also about gaining an appreciation for the intricacies and complexities of existing datastore technologies. With the knowledge we've gained, we're now better equipped -- knowing what tool to use when, and what customizations can be made to optimize the datastore to suit specific use cases.

As a seasoned engineer, you might be well familiar with blockchain technology from your experience in the finance sector. Imagine building a datastore as creating a new blockchain. Every datastore is like a new chain with its own unique blocks (data structures) and chain links (constructs and primitives). Sound coding practices, algorithmic thinking, and a heavy dose of creativity are just as crucial here.

As we think about what's next, the Python script below prompts you to take that first step. Remember, every line of code written is a stepping stone to mastery.

Become the explorer, the innovator, the creator, and above all, the learner. The true value of this journey lies in how you apply this newfound understanding to solve new problems in your projects and contributions.

Go ahead, execute the python script below and take the first step towards your datastore development journey.

xxxxxxxxxx
 
if __name__ == '__main__':
  # Python logic here
  print('This is your first step in building datastores from scratch. With the knowledge you have gained through this lesson about different primitive constructs and datastores, take the leap, experiment, fail, learn and succeed. This understanding will serve you not just in building datastores, but also in optimizing, choosing, and utilizing existing ones wisely in your future projects.')

Let's test your knowledge. Is this statement true or false?

When building a datastore from scratch, one doesn't need to apply sound coding practices, algorithmic thinking, and creativity.

Press true if you believe the statement is correct, or false otherwise.

Generating complete for this lesson!

Understanding Basic Data Structures

Let's test your knowledge. Fill in the missing part by typing it in.

Getting to Know PostgreSQL and Its Primitives

Build your intuition. Click the correct answer from the options.

Click the option that best answers the question.

Understanding Redis and Its Primitives

Try this exercise. Is this statement true or false?

Exploring MongoDB and Its Primitives

Are you sure you're getting this? Fill in the missing part by typing it in.

Examining ElasticSearch and Its Primitives

Try this exercise. Click the correct answer from the options.

Click the option that best answers the question.

Comparing Different Datastores and Their Use Cases

Are you sure you're getting this? Is this statement true or false?

Building Datastores From Scratch

Build your intuition. Fill in the missing part by typing it in.

Concluding Remarks: From Learning to Development

Let's test your knowledge. Is this statement true or false?

Programming Categories

Popular Lessons