Mark As Completed Discussion

Introduction

Time-series databases are one of the most recent specialized database models to emerge since the introduction of the relational database management system and are quickly gaining popularity. According to DB-Engines, time-series databases have outperformed all other database models by a large margin during the last two years. This lesson will go over time-series databases in-depth, including their differentiating features, benefits, and popular applications.

What is Time-Series Data?

Data collected at several points in time is referred to as time-series data. This is different from cross-sectional data which looks at people, businesses, and other entities at a particular point in time. Correlations between observations are possible since data points in a time series are collected at neighboring periods. Time-series frequency refers to the frequency at which data is collected over a given period.

Time-series data have the following characteristics:

  • Data is gathered over a set period.
  • Workload data is created from scratch and written as inserts rather than being updated to replace existing data.
  • When data is written, the most current time interval is automatically allocated to it.

Processing time-series data involves 5 different steps as shown below:

What is Time Series Data?

  1. Data model: The gathered time-series data must comply with the model's specification, including all of the time series data's characteristic qualities.

  2. Stream computing: This step is for pre-aggregation, downsampling, and post-aggregation for the time series data available.

  3. Data storage: The storage system allows the separation of cold and hot data and efficient range queries. Additionally, the system offers high throughput, vast volume, and low-cost storage.

  4. Metadata retrieval: This step involves supporting several retrieval methods such as those to store and retrieve timeline metadata in the tens of millions to hundreds of millions.

  5. Data analysis: The last step analyzes and computes time-series data in real-time.

Let's test your knowledge. Is this statement true or false?

The time-series frequency refers to the frequency at which data is collected over a given period.

Press true if you believe the statement is correct, or false otherwise.

What is a Time-Series Database?

In many cases, enormous amounts of data are received at a high rate and must be saved. Financial markets, trading companies, hedge funds, and stock exchanges are regularly updated with real-time market data and activity. Each transaction performed per product is used to make trading decisions. The data itself is critical in each of the instances above, but the timestamp at which it is generated is also critical. This timestamp enables us to get insights from data in near real-time or later in batch processing. Time-series databases (TSDB) are used to solve this problem. A TSDB is a database that is built and optimized to work with high data rates and time stamps. TSDBs are specifically intended to handle time-series data or time-stamped data. Time-series data can be gathered from a variety of events or measurements, but the data is always collected across several periods of time rather than single events.

What is Time-Series Database?

The image above displays the trade value on various days of the month. A TSDB stores the data as (time, value) pairs. When data is stored this way, evaluating time series data is simple. Concurrent series, or many variables being measured at the same time, can also be managed with this database. For example, in the image below, there is a value for Air Quality Index (AQI) and a density value. Time, however, is the most significant variable in a time-series database.

What is Time-Series Database?

Let's test your knowledge. Is this statement true or false?

A time-series database stores data as (time, value) pairs.

Press true if you believe the statement is correct, or false otherwise.

Time-Series Database vs Relational Database

Having the right dataset will save you a lot of time and effort. Furthermore, it will allow you to make informed decisions. A data analyst's job is to suggest the best database to use in different situations. Time-series databases (TSDBs) and relational databases (RDBMSs) have well-documented application cases and user bases which means that they also have well-documented differences. For instance, TSDBs can insert a huge volume of data like 1M+ inserts per second. This, however, is not possible with an RDBMS. There is no SQL support with TSDBs, but RDBMSs have full SQL support. TSDBs save storage and calculation costs, but they don't offer solutions that aren't now available. The RDBMS has served the industry well for a long time and is still fully functional.

Data reflects the current state of an entity in an RDBMS and are generally used for:

  • Content management systems
  • Storing transactional data
  • Data that needs to be updated over time
  • Long-term storage

TSDBs, on the other hand, are generally used for:

  • Monitoring systems over time
  • Analytical and reporting data processing
  • Append-only changes
  • Short-lived data sets

Build your intuition. Click the correct answer from the options.

Which of the following statements is true about time-series databases?

Click the option that best answers the question.

  • Time-series databases can handle a huge volume of data
  • SQL support does not exist for time-series databases
  • Time-series databases save storage and calculation costs
  • All above

Major Time-Series Databases

The fastest-growing part of the database industry is time-series databases. At present, there are several TSDBs in the market, so this section will explore a few of the major TSDBs.

InfluxDB

InfluxDB is a popular time-series database implemented in the Go programming language. InfluxDB is a data intake and storage engine that was built from the ground up to be highly scalable. It excels at collecting, storing, querying, visualizing, and acting on real-time streams of time series data, events, and metrics. The database allows developers to query the data using SQL-like language, making it simple to integrate into their applications. InfluxDB is also available as part of a commercial offering that includes the entire stack for processing time-series data in a full-featured, highly available environment.

Prometheus

Prometheus is yet another open-source monitoring tool for extracting insights from metrics data and triggering alarms as needed. It has a disc-based local time-series database that saves data in a custom format. Prometheus' data model is multi-dimensional and time series-based, with all data saved as streams of timestamped values. It's useful when working with a fully numeric time series. Prometheus' capacity to gather and query microservices data is one of its strongest features.

TimescaleDB

TimescaleDB is an open-source, scalable relational database for time-series data. This database was built with PostgreSQL. It comes in two flavors. The first is a free community edition that may be installed on your server. The second option is TimescaleDB Cloud, which offers fully hosted and managed cloud infrastructure for your deployment requirements. You can combine time-series with other types of data to improve the result by joining time-series and metadata. You may also use JOINs and non-time-series tables to perform more complex filtering. Using PostgreSQL TimeScale's GIS support, you can simply track geographical locations over time. It can also take advantage of all of PostgreSQL's scaling features, including replication.

Graphite

Graphite is a complete solution for storing and viewing real-time time-series data. Graphite can store time-series data as well as render graphs on demand. It does not, however, gather data for you; instead, you can use tools like collected, Ganglia, Sensu, telegraph, and others. Carbon, Whisper, and Graphite-Web are the three components. Carbon collects time-series data, aggregates it, and saves it to disc. Whisper is a data storage system for time-series databases. The front-end for producing dashboards and displaying data is Graphite-Web.

Try this exercise. Click the correct answer from the options.

Which time-series database is implemented in the Go programming language?

Click the option that best answers the question.

  • Graphite
  • TimescaleDB
  • InfluxDB
  • All above

Time-Series Database Use Cases

Accessing IoT Data

Most IoT implementations, such as connected water, energy, and temperature meters, necessitate regular data collecting and reporting. Seasonal patterns, average consumption, and inefficiencies can all be identified via time-series analysis, which provides time-stamped data points. A connected pH meter attached to a TSDB, for example, can alert a technician in charge of maintaining a certain pH level that a specific vat of water is becoming too acidic. Massive volumes of data are collected by IoT endpoints which require highly scalable time-series databases.

Time-Series Database Use Cases

Forecasting Financial Trends

Accurately predicting financial trends using just time-series data is extremely difficult. A TSDB, on the other hand, can give a lot of contextual data to aid analysts. Consider the stock market: a significant surge in airline stock could coincide with holiday travel. Alternatively, a change in corporate leadership may frighten investors, leading the stock to drop briefly. Cross-referencing data is simple with time-series databases which results in a richer, clearer picture.

Time-Series Database Use Cases

Monitoring Web Services

Time-series databases can be used by businesses to assess the success of their applications and web properties. The open-source monitoring system Prometheus, for example, is a time-series database that allows engineers to track performance patterns across time. This helps them to quickly notice when problems arise, allowing them to schedule maintenance and respond to occurrences to maintain an optimal user experience. Some web and mobile apps use a TSDB to keep track of certain events such as a button click, playing a video, or sharing some content. They can use these events to map a user's path, highlight challenges or performance bottlenecks, and streamline more sophisticated activities.

Sales Forecasting

Retail shops are obligated to constantly estimate future sales in order to appropriately stock their shelves with merchandise. Thanks to time-series databases, retailers can use statistical models based on historical data and cross-reference the data with customer behavior trends to predict future patterns and make informed decisions about which products to keep in stock and when.

Anomaly Detection

Anomaly detection aids in the detection of out-of-the-ordinary aberrations in time-series data. When a system change occurs, time-series data captures a value. Organizations can use these values to track changes, uncover how changes occurred in the past, keep track of what's going on now, and use the accumulated data to forecast future occurrences.

Time-Series Database Use Cases

A major aspect of detecting anomalies is virtualization. A time-series graphic, for example, provides the visual aid that many people want while looking for outliers. Another option is to use automated anomaly detection, which can speed up the process by providing real-time information. This makes it possible to swiftly connect outliers.

Let's test your knowledge. Is this statement true or false?

Anomaly detection is one of the major uses cases of time-series databases.

Press true if you believe the statement is correct, or false otherwise.

The Future of Time-Series Databases

Since IoT/smart devices are being more and more incorporated into our everyday lives, massive real-time traffic on websites generates millions of events per day and market trading is expanding. This will only increase the future demand for TSDBs. In the coming future, almost every industry will have a TSDB in production architecture for monitoring purposes.

Conclusion

In this lesson, you learned about time-series databases (TSDBs), their different use-cases, and various popular TSDBs present in the market.

One Pager Cheat Sheet

  • Time-series databases have rapidly become more popular than other database models in the last two years, making them a subject of worth studying due to their differentiating features and benefits.
  • Time-series data refers to data collected over a set period, and its processing involves five different steps - Data Model, Stream Computing, Data Storage, Metadata retrieval and Data Analysis.
  • The time-series frequency refers to the rate at which data is collected over a given period.
  • A Time-series Database (TSDB) is a specialized database optimized for handling large volumes of time-stamped data that allows for retrieval and analysis of data through concurrently stored variables.
  • A Time-Series Database (TSDB) is designed to store and query time-stamped (time, value) pairs and measure sequential and concurrent events.
  • Choosing the right database for your data needs is key, and understanding the differences between Time-Series Database and Relational Database makes this decision easier.
  • Time-Series Databases (TSDBs) have the ability to insert a high volume of data and do not have SQL support, which makes them well-suited for monitoring systems over time with cost savings and efficient data processing of analytical and reporting data sets.
  • The database industry is mainly composed of time-series databases, such as InfluxDB, Prometheus, TimescaleDB and Graphite, which provide different solutions for collecting, storing and querying real-time streams of time series data.
  • InfluxDB is a time-series database built with Go that enables scalability and simple integration for collecting, storing, querying, visualizing, and acting on data.
  • Time-series databases offer a range of uses such as accessing IoT data, forecasting financial trends, monitoring web services, sales forecasting, and detecting anomalies using virtualization and automated techniques.
  • Time-series databases provide the tools to visualize and detect outliers for anomaly detection, allowing organizations to track and uncover past changes, monitor real-time changes, and forecast future occurrences by understanding changes in data over time.
  • In the coming future, TSDBs will be an essential component in almost every industry, due to the expanding demand of real-time events caused by the incorporation of IoT/smart devices into everyday life.
  • You learned about time-series databases (TSDBs), their different use-cases and various popular TSDBs in the market.