background

Azure Data Platform Overview: Data Storage

Azure Data Platform Overview: Data Storage

Microsoft Azure offers many data platform technologies to meet the needs of all kinds of data in different volumes and structures.

Data engineers face different scenarios in different industries and try to solve complex data problems to provide value using data. Understanding data structures and the capabilities of various data platform technologies will help a data engineer choose the right tool for the job.

Data Structures

Structured data: This is the data stored in accordance with a predefined schema. This data can be stored in a database table containing rows and columns. Structured data is also referred to as relational data since the schema of the data defines the data table, the fields in the table, and the explicit relationship between the two.

Semi-structured data: This is the form of data that does not conform to the formal structure of data models associated with relational databases or other data table forms.

Unstructured data: This data do not have a specific structure. This also means that there are no restrictions regarding the types of data it can hold. For example, a PDF document, a JPG image, a JSON file, video content, etc.

Storage Solutions

Azure data storage options (Azure Storage) provide a reliable and durable storage solution as they are cloud-based, secure and scalable. They include Azure Blobs, Azure Data Lake Storage Gen2, Azure Files, Azure Queues, and Azure Tables services.

Some of its significant benefits are as follows:

  • Automatic backup and data Recovery
  • Replication in different locations across the World
  • Data analysis support
  • Encryption
  • Multiple data type support
  • Data storage on virtual disks
  • Storage layers

Azure Blob, is a Microsoft object storage solution for the cloud. It serves purposes such as serving images or documents directly to a browser, storing files for distributed access, streaming video and audio, storing data for backup and restore, disaster recovery, and archiving, storing data for analysis by an on-premises or Azure-hosted service.

Azure Data Lake Storage Gen2, is a comprehensive, scalable and affordable data storage solution for analytics of big data on Azure. This Azure Blob based service also provides support for big data analytics in addition to the existing Blob service and offerings. Azure Data Lake Storage Gen2 can be used together with the following Azure services: Azure Data Factory, Azure Databricks, Azure Event Hubs, Azure Logic Apps, Azure Machine Learning, Azure Cognitive Search, Azure Stream Analytics, Data Box, HDInsight, IoT Hub, Power BI, SQL Data Warehouse, SQL Server Integration Services (SSIS)

It plays an important role in the creation of big data architectures such as:

  • Modern data warehouse
  • Improved analytics on big data
  • Real time analytics solution

For example, Azure Data Lake Storage Gen2 plays the following role in real-time analytics architecture:

Azure Dosyalar (Files), provides fully managed cross-platform file sharing in the cloud. It can be accessed from the cloud or from on-premises Windows, Linux and macOS environments simultaneously. Additionally, Azure file shares can be cached on Windows Servers with Azure File Sync feature for fast access to a near place where data is used. It supports the industry standard SMB protocol, which means you can seamlessly change your company compatibility file shares with Azure file shares without worrying about application compatibility. It can be created without the need to manage the hardware or operating system. This means you do not have to bother with updating / patching the server operating system or replacing faulty hard drives.

Azure Queues, is a service used to store large numbers of messages. You access the messages from anywhere in the world with authenticated calls using HTTP or HTTPS. A queue message can be up to 64 KB in size. A queue can contain millions of messages up to the total capacity limit of a storage account. The queues are often used to create a worklist to be processed asynchronously.

Azure Table storage, is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schematic design. Because Table storage is schematic, it is easy to adapt your data as the needs of your application evolve. Access to Table storage data is fast and cost-effective for many types of applications and is typically lower in cost than traditional SQL for similar volumes of data.

You can use Table storage to store flexible datasets like user data for web applications, address books, device information, or other types of metadata your service requires. You can store any number of entities in a table, and a storage account may contain any number of tables, up to the capacity limit of the storage account.

Some examples of use are as follows: storing terabytes of data to be used in web applications, storing data sets that do not contain complex relationships and can be denormalized for fast access, querying data quickly using a clustered index.

Talha Turan - Data Engineer

How can we help you?