szhzxw.cn/cxounion.org
Doris是一种基于SQL的大规模并行处理(MPP)开源分析数据仓库,正在Apache Incubator(Apache孵化器)进行开发。现在,Doris跻身顶级项目行列,据Apache 软件基金会(ASF)声称,这意味着“它已证明了能够进行适当的自治”。
该数据仓库最近迎来了版本1.0,这是它在该孵化器进行开发的第八个版本(还有六个Connector版本)。它旨在支持联机分析处理(OLAP)工作负载,通常用于数据科学场景。
Doris原名Palo,诞生于中国互联网搜索巨头百度,是其广告业务的数据仓库系统,2017 年开源,2018年进入Apache 孵化器。
Doris植根于Apache Impala和Google Mesa
据Apache软件基金会声称,Doris基于Google Mesa和Apache Impala集成,Apache Impala是2012年开发的开源MPP SQL查询引擎,基于Google F1的基础。
Mesa在2014年左右被设计成一种高度可扩展的分析数据仓库系统,用于存储与谷歌互联网广告业务相关的关键测量数据。
据百度和Apache孵化器的开发人员声称,Doris提供了简单的设计架构,同时提供了很高的可用性、可靠性、容错性和可扩展性。
“易于(开发、部署和使用),以及单一系统满足众多数据服务的需求,这是Doris的两大特点”,Apache软件基金会在一份声明中表示,补充道该数据仓库支持多维报告、用户画像、即席查询和实时仪表板。
Doris的其他一些功能包括列存储、并行执行、矢量化技术、查询优化、ANSI SQL,以及通过面向Apache Flink、Apache Hive、Apache Hudi、Apache Iceberg、Apache Spark、 Elasticsearch及其他系统的连接件与大数据生态系统集成。(华东CIO大会、华东CIO联盟、CDLC中国数字化灯塔大会、CXO数字化研学之旅、数字化江湖-讲武堂,数字化江湖-大侠传、数字化江湖-论剑、CXO系列管理论坛(陆家嘴CXO管理论坛、宁波东钱湖CXO管理论坛等)、数字化转型网,走进灯塔工厂系列、ECIO大会等)
展开全文
开源数据库的使用量预计将增长
企业级开源数据库的使用率预计会增长。咨询公司Gartner在《2019年开源DBMS市场状况》报告中预测,到2022年底,超过70%的新的内部应用程序将在开源数据库管理系统(OSDBMS)或基于OSDBMS的数据库平台即服务(dbPaaS)上开发。
此外,随着数据激增和企业越来越需要实时分析,一种简单的大规模并行处理开源数据库成为了当下的需要。
Ventana Research研究总监David Menninger说:“随着数据量不断增长,MPP数据库成为了能够以足够快的速度或足够低的成本处理数据以满足组织需求的唯一实际方法。”
云架构激发了组织对MPP数据库的兴趣
Menninger表示,推动MPP数据库发展的其他趋势是现在有了相对廉价的基于云的服务器实例,这些实例可以用作MPP配置的一部分,因而组织不需要采购和安装这些系统使用的物理硬件。
Menninger认为Doris大有希望,虽然有许多MPP数据库可选,其中一些是开源的,但实际上没有一种开源的MPP MySQL替代方案。
“MySQL本身和MariaDB已经过扩展,可支持更庞大的分析工作负载,但它们最初是为事务处理设计的”,Menninger说,补充道可以将开源PostreSQL数据库Greenplum以及Google BigQuery、Amazon RedShift和Microsoft Synapse等超大规模服务视为Doris的竞争对手。
此外,Gartner大数据和分析前研究副总裁Sanjeev Mohan表示,还可以将ClickHouse、Apache Druid和Apache Pinot视为是竞争对手。
据Apache基金会声称,使用Doris可能有诸多优势,比如架构简单和更快的查询时间。
Doris简单的原因之一是,它不依赖多个组件来完成类管理、同步和通信之类的任务。快速查询时间可归因于矢量化,这种方法让程序或算法可以一次针对多个值而不是单个值进行操作。
据Apache基金会的开发人员声称,该数据仓库的另一个好处是Doris的超高并发支持,这意味着它可以同时处理来自成千上万用户提出的处理数据、从数据库获取洞察力的请求。
由于大多数组织允许其员工访问数据,以便促进他们利用数据获取洞察力,而不是只有高管才能享用分析工具,如今对高并发性的需求已有所增加。
原文:
In case you are wondering who “she” is and what school she went to, Doris is an open source, SQL-based massively parallel processing (MPP) analytical data warehouse that was under development at Apache Incubator.
Last week, Doris achieved the status of top-level project, which according to the Apache Software Foundation (ASF) means that “it has proven its ability to be properly self-governed.”
The data warehouse was recently released in version 1.0, its eighth release while undergoing development at the incubator (along with six Connector releases). It has been built to support online analytical processing (OLAP) workloads, often used in data science scenarios.
Doris, originally known as Palo, was born inside Chinese internet search giant Baidu as a data warehousing system for its advertisement business before being open sourced in 2017 and entering the Apache Incubator in 2018.
Doris has roots in Apache Impala and Google Mesa
Doris, according to the Apache Software Foundation, is based on the integration of Google Mesa and Apache Impala, an open source MPP SQL query engine, developed in 2012 and based on the underpinnings of Google F1.
Mesa, which was designed to be a highly scalable analytic data warehousing system around 2014, was used to store critical measurement data related to Google’s Internet advertising business.
According to its developers, both at Baidu and at the Apache Incubator, Doris offers simple design architecture while providing high availability, reliability, fault tolerance, and scalability.
“The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Doris,” the Apache Software Foundation said in a statement, adding that the data warehouse supports multidimensional reporting, user portraits, ad-hoc queries, and real-time dashboards.
Some of the other features of Doris includes columnar storage, parallel execution, vectorization technology, query optimization, ANSI SQL, and integration with big data ecosystems via connectors for Apache Flink, Apache Hive, Apache Hudi, Apache Iceberg, Apache Spark, and Elasticsearch, among other systems.
Uptake of open source databases forecast to grow
Uptake of enterprise grade, open source databases have been expected to grow. In Gartner’s State of the Open-Source DBMS Market 2019 report, the consulting firm predicted that more than 70% of new in-house applications will be developed on an Open Source Database Management System (OSDBMS) or an OSDBMS-based Database Platform-as-a-Service (dbPaaS) by the end of 2022.
In addition, as data proliferates and businesses’ need for real-time analytics grows, a simple yet massively parallel processing database that is also open source, seems to be the need of the hour.
“As data volumes have grown, MPP databases became the only realistic way to process data quickly enough or cheaply enough to meet organizations’ demands,” said David Menninger, research director at Ventana Research.
Cloud architecture fuels interest in MPP databases
The other trends fueling MPP databases are the availability of relatively inexpensive cloud-based instances of servers, which can be used as part of the MPP configuration, thus eliminating the need to procure and install the physical hardware these systems use, Menninger said.
Making a case for Doris, Menninger said that while there are many MPP database options, some of which are open sourced, there isn’t really an open source, MPP MySQL alternative.
“MySQL itself and MariaDB have been extended to support larger analytical workloads, but they were initially designed for transaction processing,” Menninger said, adding that open source PostreSQL database Greenplum and hyperscaler services such as Google BigQuery, Amazon RedShift, and Microsoft Synapse could be considered as rivals to Doris.
In addition, ClickHouse, Apache Druid, and Apache Pinot could also be considered rivals, said Sanjeev Mohan, former research vice president for big data and analytics at Gartner.
According to the Apache Foundation, using Doris could have multiple advantages, such as architectural simplicity and faster query times.
One of the reasons behind Doris’ simplicity is its non-dependency on multiple components for tasks such as class management, synchronization and communication. Its fast query times can be attributed to vectorization, a process that allows a program or an algorithm to operate on a multiple set of values at one time rather than a single value.
Another benefit of the data warehouse, according to the developers at the Apache Foundation, is Doris’ ultra-high concurrency support, meaning it can handle requests from tens of thousands of users to process data and gain insights from the database at the same time.
The need for high concurrency has increased because most organizations are allowing their employees to access data in order to drive data-driven insights in contrast to just C-suite executives having access to analytics.
本文主要内容转载出自InfoWorld,原作者为Anirban Ghoshal,仅供广大读者参考,如有侵犯您的知识产权或者权益,请联系我提供证据,我会予以删除。
CXO联盟(CXO union)是一家聚焦于CIO,CDO,cto,ciso,cfo,coo,chro,cpo,ceo等人群的平台组织,其中在CIO会议领域的领头羊,目前举办了大量的CIO大会、CIO论坛、CIO活动、CIO会议、CIO峰会、CIO会展。如华东CIO会议、华南cio会议、华北cio会议、中国cio会议、西部CIO会议。在这里,你可以参加大量的IT大会、IT行业会议、IT行业论坛、IT行业会展、数字化论坛、数字化转型论坛,在这里你可以认识很多的首席信息官、首席数字官、首席财务官、首席技术官、首席人力资源官、首席运营官、首席执行官、IT总监、财务总监、信息总监、运营总监、采购总监、供应链总监。
数字化转型网(资讯媒体,是企业数字化转型的必读参考,在这里你可以学习大量的知识,如财务数字化转型、供应链数字化转型、运营数字化转型、生产数字化转型、人力资源数字化转型、市场营销数字化转型。通过关注我们的公众号,你就知道如何实现企业数字化转型?数字化转型如何做?
【CXO UNION部分社群会员】一鸣CISO、华生CISO、确成CISO、健麾CISO、国光连锁CISO、富春染织CISO、华通线缆CISO、德利CISO、葫芦娃CISO、永茂泰CISO、伟时CISO、起帆电缆CISO、神通CISO、天普CISO、协和CISO、绿田机械CISO、健之佳CISO、王力安防CISO、新亚CISO、同力日升CISO、德才CISO、凯迪CISO、罗曼CISO、神农CISO、必得CISO、舒华CISO、佳禾CISO、园林CISO、中际联合CISO、法狮龙CISO、无锡振华CISO、沪光CISO、帅丰CISO、李子园CISO、巴比CISO、南侨CISO、立昂微CISO、立达信CISO、宏柏CISO、蓝天燃气CISO、拱东CISO、博迁CISO、华旺CISO、野马电池CISO、均瑶CISO、长龄液压CISO、新炬CISO、晨光CISO、福莱CISO、东鹏CISO、森林包装CISO、国邦CISO、龙版CISO、恒盛CISO、冠石CISO、圣泉CISO、港湾CISO、菜百CISO、华兴源创CISO、睿创微纳CISO、天准CISO、博汇CISO、容百CISO、杭可CISO、光峰CISO、澜起CISO、通号CISO、福光CISO、新光光电CISO、中微CISO、天臣CISO、交控CISO、心脉CISO、绿的谐波CISO、乐鑫CISO、安集CISO、方邦CISO、奥福CISO、瀚川智能CISO、安恒CISO、杰普特CISO、洁特CISO、国盾量子CISO、沃尔德CISO、南微医学CISO、山石网科CISO、天宜上佳CISO、传音CISO、芯源微CISO、中科通达CISO、当虹CISO、爱博CISO、佳华CISO、龙腾光电CISO、莱伯泰科CISO、金达莱CISO、宝兰德CISO、华锐精密CISO、云涌CISO、派能CISO、凯赛CISO、航天宏图CISO、爱威CISO、热景CISO、德林海CISO、纵横CISO、华依CISO等
特别声明
本文仅代表作者观点,不代表本站立场,本站仅提供信息存储服务。