08 Aug Amazon Redshift – A powerful data warehousing solution in the cloud
TenPoint7’s customers are concerned with building and maintaining a data warehouse because it has always been complicated and expensive. To setup data warehousing systems, it traditionally requires a significant upfront capital expenditure for software and hardware. It also takes our customers plenty of time and expense for planning, procurement, implementation and deployment. After the initial investments, they have to recruit database administrators to take care of these complex systems. Moreover, our customers struggle with scaling traditional data warehouses when data volumes grow. At TenPoint7, we are looking xenical viagra cialis for solutions to help our customers migrate data warehousing to the cloud in order to increase performance and reduce costs. After a significant time of investigation, we discovered Amazon Redshift, which is available on Amazon Web Services (AWS). It is a modern approach to replace traditional data warehouse platforms. Redshift not only significantly lowers the cost of a data warehouse, but also uses a variety of innovations to achieve higher performance compared too traditional on premise implementations particularly when analyzing large datasets.
Relational databases such as Oracle Database Server, Microsoft SQL Server, MySQL, and PostgreSQL are row-oriented database systems. These systems have been traditionally used for data warehousing although they are better suited for OLTP than for OLAP. They typically store whole rows in a physical block so that every query has to read through all the columns for all the rows in the block that satisfy the query. In contrast, Redshift applies columnar storage which organizes each column in its own set of physical blocks instead of packing the whole rows in a block. This functionality increases I/O efficiency for read-only queries because of the reduced data that needs to load from disk. Organizing the data by column also helps a further reduction of disk space and I/O by compressing the same type of data of the column data type. Moreover, by applying multiple compression techniques Redshift uses less space than row-oriented database systems. It also employs a massive parallel processing (MMP) architecture which takes advantage of all available resources by paralleling and distributing SQL operations. By this way, it dramatically increases performance of petabyte-scale data warehouses.
Durability and Availability
Amazon Redshift uses replication and continuous backups to enhance availability and improve data durability. It automatically detects and replaces any failed node in its data warehouse cluster and this replacement takes effect immediately so we can resume querying our data as quickly as possible.
We can easily change the number and type of nodes with just a few clicks when our performance or capacity needs change. Redshift enables us to start with a single node, 160GB data warehouse and scale up all the way to a petabyte or more.
It is built based on PostgreSQL which makes it easier for those who are already familiar with relational databases and SQL to develop solutions. We can run our existing queries with little or no modification. It can integrate with many popular BI and ETL vendors via JDBC and ODC drivers. We can ingest or export data to and from multiple resources including Amazon S3, EMR and DynamoDB. Easily loading streaming data into Redshift using Kinesis enables near real-time analytics with existing BI tools and dashboards. Moreover, Redshift automates most of the common administrative tasks such as configuration, maintenance, monitoring and provisioning. This makes it very easy and intuitive to manage in comparison to traditional data warehouses.
We can secure our data by running Redshift inside a virtual private cloud using Amazon Virtual Private Cloud (VPC). It also support SSL-enabled connections between our application and Redshift. We can enhance our data at rest by applying encryption. Redshift encrypts each block using AES-256 encryption as each block is written to disk.
Like many other AWS services, it requires no long-term commitments or upfront costs. We pay only for what we use. Charges are based on the size and number of nodes in our cluster. This approach helps us to reduce the expense and complexity of planning and purchasing data warehouse capacity ahead of our needs.
In conclusion, Amazon Redshift has changed our thinking about data warehousing for migrating on-premises solutions to the cloud providing simplicity, performance and cost-effectiveness. TenPoint7 is evaluating Redshift as a component of our cloud-based Analytics-as-a-Service (AaaS) platform which provides a simple, accelerated and attainable path for businesses to become data-driven in a flexible, secure, reliable, and scalable manner.