Search This Blog

Wednesday, November 14, 2018

High Availability

By Google Translate:

Within a data center, eliminating single points of failure inside database and application services is easier, but high availability across data centers, taking into account the network topology, delay or interruption, data consistency, efficiency and other issues, is more complicated. We have established twin live data center, different data centers to share different business. In accordance with the priority service, response is divided into two categories: real-time business-related transactions, must not exceed one-minute break; Data analysis related to the semi-real-time services, interrupts must not exceed the hours-level . Main data center is for POS, micro-channel, real-time online trading terminals such as Android devices related business. Disaster recovery data center is for data analysis, regular tasks, log analysis, Storm and other semi-real-time services. By copying data centers and flow through pgq holding two complete transactions. When the data center or network is abnormal, business data required for each data center are substantially intact, then there are two options: a) let a data center fully take over all operations. b) data is not synchronized, but data centers can be used normally, after abnormal resolved, synchronization of data begins. Response to disaster by business classification, can make more rational use of the double live data center. Depending on the abnormality, response automatically or manually.


一个数据中心内,消除数据库和应用服务的单点故障比较容易,但跨数据中心的高可用性,要考虑到网络拓扑、延时或中断、数据一致、处理效率等问题,比较繁琐。我们建立双活数据中心,不同数据中心分担不同的业务。按照业务响应的优先级别划分为两类:交易相关的实时业务,不允许超过分钟级中断;数据分析相关的准实时业务,不允许超过小时级中断。主数据中心处理POS、微信、安卓终端等在线交易实时相关的业务,灾备数据中心处理数据分析、定时任务、日志分析、Storm等准实时业务,通过pgq 和流复制各数据中心保持两份完整的交易数据。当数据中心或网络出现异常时,每一个数据中心所需的业务数据基本完整,此时有两个选择:a) 由一个数据中心全面接管所有业务。 b) 数据不同步,但各数据中心均可正常使用,异常解决后,继续同步数据。按业务响应分级容灾,可以更合理的利用双活数据中心,根据不同的异常情况,自动或人工做出响应。