Apache Beam (https://beam.apache.org/get-started/quickstart-py/)
Used for running pipelines such at extract, transform and load (ETL) in a parallel fashion.
In-memory data structure store for DB and cache
Fluend - https://www.fluentd.org/
Unified logging/streaming logs
Zipkin - zipkin.io
Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in service architectures. Features include both the collection and lookup of this data.
Greenplum - big data technology based on massively parallel processing (MPP) techniques with Postgres; it competes head to head with Amazon RedShift, Microsoft Azure, Alibaba's AnalyticDB, Teradata and Google BigQuery.
Delphix - DB virtualisation
MooseFS - open source distributed file system