InfluxDB 3 & Rust

11/08/23 • 56 min

InfluxDB 3.0 Rewrite

InfluxDB, a time series database, underwent a major rewrite to create InfluxDB 3.0, also known as IOx. The decision to rewrite the database was driven by the need for strict control over memory management and high performance. The project started as a research endeavor and gradually gained traction within the company. The team decided to build around projects under the Apache Foundation, such as Apache Arrow and Apache Data Fusion. In April 2022, InfluxDB 3.0 was officially announced, aiming to improve performance, scalability, and cost-effectiveness for users.

IOx Database Engine

The new database engine, IOx, is designed to handle various types of observability and monitoring data, including metrics, traces, and logs. It aims to provide a single store for all these signals, eliminating the need for separate databases. However, querying the data efficiently is still a challenge that the team is working on. The goal is to make IOx the go-to solution for storing and querying observational data, not only for server infrastructure monitoring but also for sensor data use cases.

Challenges and Considerations

Working with logs, tracing, and structured events in time series databases poses challenges. The dynamic and inconsistent nature of schemas in logs and tracing use cases can make extracting structured fields difficult. Time series databases also have limitations in handling tracing front ends and require an index to map trace IDs to individual traces. While metrics, logs, and traces are the gold standard for observability, there is room for improvement in terms of usability and performance.

Flux and Data Fusion

Flux, a scripting language developed for InfluxDB 2.0, addresses user requests for more complex query logic and integration with third-party systems. InfluxDB 3.0 incorporates a parser in Rust to translate SQL queries into a Data Fusion query plan, benefiting from the performance optimizations of Data Fusion. However, bringing Flux to InfluxDB 3.0 proved challenging due to the large surface area of Flux and limited time and resources. Updating the Flux engine to use the 3.0 native API could potentially resolve these issues.

InfluxDB Development and Open Source Licensing

InfluxData is focused on improving the core query engine of InfluxDB and enhancing its capabilities and performance. They have created a separate community fork of Flux to allow collaboration on its development. Paul Dix, the co-founder, believes that true open source should be about freedom and expresses his intention to keep InfluxDB 3 as a permissively licensed project. He discusses the recent license change by HashiCorp and the growing distrust in the developer community towards VC-backed open source projects. Putting InfluxDB into a foundation may not be feasible due to the lack of multiple contributors.

InfluxDB 3.0 Rewrite

IOx Database Engine

Challenges and Considerations

Flux and Data Fusion

InfluxDB Development and Open Source Licensing

Previous Episode

Trust and Validation in AI

Here are 5 key takeaways from this episode that you don't want to miss:

1️⃣ The People Problem: Laura Santamaria raises an important concern about verifying AI-generated outputs and tackling the challenge of the "people problem" in AI development.

2️⃣ Verifying Data Authenticity: JJ discusses the challenge of proving that a data blob originated from a specific model and how this issue is being addressed by companies like IBM through pile cleaning and legal penalties.

3️⃣ AI Misconceptions: We debunk some common misconceptions about AI, including the belief that it is an all-knowing fact machine.

4️⃣ Trusted AI: IBM's approach to building trusted models, with dedicated engineers responsible for cleaning and verifying data, is explained. Plus, we discover IBM's partnerships with Hugging Face to leverage the open-source ecosystem.

5️⃣ The Impact of AI: We delve into the potential positive and negative implications of AI, and how the rapid advancement of this technology presents challenges with trust and validation.

💡 Fun Fact: Did you know that 95% of open-source language models are trained on a data set called "the pile," which contains pirated and copyrighted material? Discover why this has implications for copyright and patent laws!

As always, the conversation in this episode is engaging and eye-opening. JJ Asghar provides insightful perspectives and sheds light on the future of AI development. Don't miss out on the valuable information shared!

Questions We Covered

1. How can the problem of untrusted data in AI models be effectively addressed?
2. Should companies like OpenAI and Microsoft be required to provide their data sets for verification purposes? Why or why not?
3. What are the potential risks and challenges associated with using AI technology without proper regulation?
4. Should AI creations be eligible for copyright protection? Why or why not?
5. How can we ensure the accuracy and trustworthiness of AI-generated data, especially when it comes to extracting information from sources like PDFs?
6. What are some potential positive impacts of AI technology, and how can we maximize its benefits while minimizing its negative implications?
7. How can the rapid advancement of AI technology be balanced with the need for trust and validation?
8. In what ways do copyright and patent laws need to evolve to accommodate AI technology?
9. What are the implications of China having its own set of laws and approaches to technology that may differ from other countries?
10. How can individuals navigate and better understand the AI space in order to make informed decisions and contributions?

Next Episode

From Kubernetes to Cloud Run: Chainguard's Journey

Exploring Cloud Migrations & Infrastructure Strategies with Jason Hall of Chainguard

Click here to watch a video of this episode.
In this episode of the Cloud Native Compass podcast, hosts David Flanagan and Laura Santamaria chat with Jason Hall, Principal Engineer at Chainguard. They delve into Chainguard's migration from Kubernetes and Knative to Cloud Run, discussing the reasons behind the move, cost considerations, managing technical debt, and best practices for infrastructure management. The conversation also covers the benefits of using Cloud Run, their strategic use of BigQuery for event logging, and insights into least access security models. Tune in to learn more about navigating cloud-native environments and optimizing infrastructure.

Creators & Guests