Machine Learning driven Database Optimization with Luigi Nardi from DBtune

In this week's episode of Simplyblock's Cloud Commute podcast, host Chris Engelbert interviews Luigi Nardi, founder and CEO of DBtune. Luigi provides insights into the complexities of database optimization and the significant role of machine learning in improving system efficiency.

In this episode of Cloud Commute, Chris and Luigi discuss:

Introduction to DBtune: Machine learning for database optimization
Postgres tuning: Automating configuration for performance and sustainability
Security and privacy concerns in database optimization and how DBtune addresses them
Future trends in databases: Vector databases, machine learning, and database-as-a-service (DBaaS)

Interested to learn more about the cloud infrastructure stack like storage, security, and Kubernetes? Head to our website (www.simplyblock.io/cloud-commute-podcast) for more episodes, and follow us on LinkedIn (www.linkedin.com/company/simplyblock-io). You can also check out the detailed show notes on Youtube (www.youtube.com/watch?v=lkquofjg9zo).

You can find Luigi Nardi on X @luiginardi and Linkedin: /nardiluigi.

About simplyblock:

Simplyblock is an intelligent database storage orchestrator for IO-intensive workloads in Kubernetes, including databases and analytics solutions. It uses smart NVMe caching to speed up read I/O latency and queries. A single system connects local NVMe disks, GP3 volumes, and S3 making it easier to handle storage capacity and performance. With the benefits of thin provisioning, storage tiering, and volume pooling, your database workloads get better performance at lower cost without changes to existing AWS infrastructure.

👉 Get started with simplyblock: https://www.simplyblock.io/buy-now

🏪 simplyblock AWS Marketplace: https://aws.amazon.com/marketplace/seller-profile?id=seller-fzdtuccq3edzm

01:00:00
So in terms of like benefits,

01:00:02
of course, if you optimize your

01:00:03
queries, you will write your

01:00:04
queries, you can get, you know,

01:00:06
two or three order of magnitude

01:00:07
performance improvement, which is

01:00:09
which is really, really great.

01:00:10
If you optimize the configuration

01:00:12
of your system, you can get, you

01:00:14
know, an order of magnitude in

01:00:15
terms of performance improvement.

01:00:17
And that's, that's still very,

01:00:18
very significant.

01:00:23
You're listening to simplyblock's Cloud Commute Podcast,

01:00:26
your weekly 20 minute

01:00:27
podcast about cloud technologies,

01:00:29
Kubernetes, security,

01:00:30
sustainability, and more.

01:00:32
Hello, everyone. Welcome back to

01:00:33
this week's episode

01:00:34
of simplyblock's Cloud

01:00:35
Community podcast. This week I

01:00:37
have Luigi with me. Luigi,

01:00:41
obviously, from Italy.

01:00:44
I don't think he has anything to

01:00:45
do with Super Mario,

01:00:46
but he can tell us about

01:00:48
that himself. So welcome, Luigi.

01:00:51
Sorry for the really bad joke.

01:00:54
Glad to be here, Chris.

01:00:56
So maybe you start with

01:00:59
introducing yourself. Who are you?

01:01:02
We already know where you're from,

01:01:03
but I'm not sure if you're

01:01:04
actually residing in Italy. So

01:01:06
maybe just tell us a

01:01:07
little bit about you.

01:01:08
Sure. Yes, I'm originally Italian.

01:01:10
I left the country to explore and

01:01:14
study abroad a little while ago.

01:01:17
So in 2006, I moved to France and

01:01:21
started there for a little while.

01:01:24
I spent almost seven years in

01:01:25
total in France eventually. I did

01:01:27
my PhD program there in Paris and

01:01:31
worked in a company as a software

01:01:33
engineer as well.

01:01:36
So moved to the UK for a few

01:01:39
years, did a postdoc at the

01:01:42
Imperial College London in

01:01:43
downtown London,

01:01:44
and then moved to the US. So lived

01:01:47
in California, Palo Alto more

01:01:49
precisely for a few years.

01:01:51
And then in 2019, I came back to

01:01:53
Europe and established my

01:01:55
residency in Sweden.

01:01:59
Right. Okay. So you're in Sweden

01:02:01
right now.

01:02:02
That's correct.

01:02:04
Oh, nice. Nice. How's

01:02:06
weather? Is it still cold?

01:02:08
It's great. Everybody thinks that

01:02:09
Sweden has very bad weather, but

01:02:11
Sweden is a very,

01:02:12
very long country.

01:02:13
So if you reside in the south,

01:02:16
actually the weather is pretty,

01:02:17
pretty decent. It

01:02:18
doesn't snow very much.

01:02:20
That is very

01:02:21
true. I'll actually love

01:02:23
Stockholm, a very beautiful city.

01:02:25
All right. One thing you haven't

01:02:27
mentioned, you're actually the

01:02:28
founder and CEO of DBtune.

01:02:32
So you left out the best part, I

01:02:34
guess. Maybe tell us a little bit

01:02:36
about DBtune now.

01:02:39
Sure. DBtune is a company that is

01:02:41
about four years old now. It's a

01:02:43
spinoff from Stanford University.

01:02:46
And the commercialization of about

01:02:50
a decade of research and

01:02:51
development in academia.

01:02:54
So we were working on the

01:02:56
intersection between machine

01:02:57
learning and computer systems

01:02:59
and specifically the use of

01:03:01
machine learning to optimize

01:03:02
computer systems.

01:03:03
This is an area that in around

01:03:05
2018 or 2019 received a new name,

01:03:09
which is MLSys machine

01:03:10
learning and systems.

01:03:12
And so this new area is quite

01:03:15
prominent these days. And

01:03:16
you can do very beautiful things

01:03:19
with the combination

01:03:20
of these two pieces.

01:03:24
So DbTune is specifically focusing

01:03:26
on using machine learning to

01:03:28
optimize computer systems,

01:03:30
specifically in the

01:03:31
computer system area.

01:03:33
We are optimizing databases, the

01:03:34
base management

01:03:35
systems more specifically.

01:03:37
And the idea is that you can

01:03:40
automate the process of tuning

01:03:44
databases. At least we are

01:03:46
focusing on the optimization of

01:03:48
the parameters of the database

01:03:50
management systems, the parameters

01:03:52
that govern the runtime system.

01:03:53
Meaning the way the disk, the RAM

01:03:57
and the CPU interact with each

01:04:00
other. So you take the von Neumann

01:04:02
model and you try to make it as

01:04:04
efficient as possible through

01:04:06
optimizing the parameters that

01:04:10
govern that interaction.

01:04:12
And so by doing that you automate

01:04:14
the process, which means that

01:04:16
database engineers, database

01:04:18
administrators can focus on other

01:04:21
tasks that are equally important

01:04:24
or even more important.

01:04:26
And at the same time you get great

01:04:27
performance, you can reduce your

01:04:30
cloud costs as well. If you're

01:04:32
running in the cloud in an

01:04:33
efficient way you can

01:04:35
optimize the cloud costs.

01:04:36
And at the same time you get also

01:04:38
a check on your greenops, meaning

01:04:42
sustainability aspect of it. So

01:04:44
this is one of the examples I

01:04:46
really like of how you can be an

01:04:48
engineer and provide quite a big

01:04:50
contribution in terms of

01:04:52
sustainability as well because you

01:04:54
can connect these two things by

01:04:56
making your software run more

01:04:57
efficiently and then scaling down

01:04:59
your operations.

01:05:02
That is true. And it's, yeah, I've

01:05:05
never thought about that, but

01:05:06
sure. I mean, if I get my queries

01:05:08
to run more efficient and use less

01:05:11
compute time and compute power,

01:05:13
huh, that is actually a good

01:05:15
thing. Now I'm

01:05:16
feeling much better.

01:05:20
I'm feeling much better too. Since

01:05:22
we started talking a little bit

01:05:25
more about this, we have a blog

01:05:28
post that will be released pretty

01:05:29
soon about this

01:05:30
very specific topic.

01:05:31
I think this connection between

01:05:33
making software run efficiently

01:05:36
and the downstream effects of that

01:05:42
efficiency, both on your cost,

01:05:44
infrastructure cost, but also on

01:05:46
the efficiency of your operations.

01:05:49
It's often

01:05:49
underestimated, I would say.

01:05:54
Yeah, that's fair. It would be

01:05:57
nice if you, when it's published,

01:05:59
just send me over the link and I'm

01:06:01
putting it into the show notes

01:06:03
because I think that will be

01:06:04
really interesting

01:06:05
to a lot of people.

01:06:06
As he said specifically for

01:06:08
developers that would otherwise

01:06:10
have a hard time having anything

01:06:13
in terms of sustainability.

01:06:16
You mentioned database systems,

01:06:19
but I think DBtune specifically is

01:06:21
focused on Postgres, isn't it?

01:06:25
Right. Today we are focusing on

01:06:26
Postgres. As a proof of concept,

01:06:29
though, we have applied similar

01:06:31
technology to five different

01:06:33
database management systems,

01:06:35
including relational and

01:06:36
non-relational systems as well.

01:06:37
So we were, a little while ago, we

01:06:40
wanted to show that this

01:06:41
technology can be used across the

01:06:44
board. And so we play around with

01:06:47
MySQL, with FoundationDB, which is

01:06:50
the system behind iCloud, for

01:06:54
example, and many of

01:06:56
the VMware products.

01:06:58
And then we have RocksDB, which is

01:06:59
behind your Instagram and Facebook

01:07:01
and so on. Facebook, very

01:07:03
pressing that open

01:07:04
source storage system.

01:07:08
And things like SAP HANA as well,

01:07:10
we've been focusing on that a

01:07:12
little bit as well, just as a

01:07:12
proof of concept to show that

01:07:14
basically the same methodology can

01:07:16
apply to very different database

01:07:19
management systems in general.

01:07:21
Right. You want to look into

01:07:24
Oracle and take a chunk of their

01:07:26
money, I guess. But you're on the

01:07:29
right track with SAP HANA. It's

01:07:31
kind of on the same level.

01:07:33
So how does that work? I think you

01:07:36
have to have some kind of an agent

01:07:38
inside of your database. For

01:07:40
Postgres, you're probably using

01:07:41
the stats tables, but I guess

01:07:43
you're doing more, right?

01:07:46
Right. This is the idea of, you

01:07:49
know, observability and monitoring

01:07:50
companies. They mainly focus on

01:07:52
gathering all this metrics from

01:07:55
the machine and then getting you a

01:07:59
very nice

01:07:59
visualization on your dashboard.

01:08:01
As a user, you would look at these

01:08:04
metrics and how they evolve over

01:08:07
time, and then they help you guide

01:08:10
the next step, which is some sort

01:08:12
of manual

01:08:12
optimization of your system.

01:08:15
We are moving one step forward and

01:08:17
we're trying to use those metrics

01:08:20
automatically instead of just

01:08:22
giving them back to the user.

01:08:22
So we move from a passive

01:08:24
monitoring approach to an active

01:08:26
approach where the metrics are

01:08:29
collected and then the algorithm

01:08:31
will help you also to

01:08:33
automatically change the

01:08:35
configuration of the system in a

01:08:37
way that it gets faster over time.

01:08:41
And so the metrics that we look at

01:08:43
usually are, well, the algorithm

01:08:46
itself will gather a number of

01:08:48
metrics to help it to improve over

01:08:50
time. And this type of metrics are

01:08:52
related to, you know, your system

01:08:54
usage, you know, CPU

01:08:56
memory and disk usage.

01:08:59
And other things, for example,

01:09:02
latency and throughput as well

01:09:04
from your Postgres database

01:09:06
management system. So using things

01:09:08
like pg_stat_statements, for

01:09:09
example, for people that are a

01:09:11
little more

01:09:12
familiar with Postgres.

01:09:14
And by design, we refrain from

01:09:17
looking inside your tables or

01:09:18
looking specifically at your

01:09:20
metadata, at your queries, for

01:09:22
example, we refrain from that

01:09:24
because it's easier to basically,

01:09:28
you know, deploy our system in a

01:09:32
way that it's not dangerous for

01:09:34
your data and for your privacy

01:09:36
concerns and things like that.

01:09:38
Right. Okay. And then you send

01:09:41
that to a cloud instance that

01:09:43
visualizes the data, just the

01:09:45
simple stuff, but there's also

01:09:47
machine learning that actually

01:09:49
looks at all the collected data

01:09:51
and I guess try to find pattern.

01:09:53
And how does that work? I mean,

01:09:58
you probably have a version of the

01:10:00
query parser, the Postgres query

01:10:02
parser in the backend to actually

01:10:04
make sense of this information,

01:10:07
see what the

01:10:07
execution plan would be.

01:10:09
That is just me guessing. I don't

01:10:11
want to spoil your product.

01:10:13
No, that's okay. So the agent is

01:10:18
open source and it gets installed

01:10:21
on your environment. And anyone

01:10:25
fluent in Python can read that in

01:10:27
probably 20 minutes. So it's

01:10:28
pretty, it's not

01:10:29
massive. It's not very big.

01:10:31
That's what gets connected with

01:10:33
our backend system, which is

01:10:37
running in our cloud. And the two

01:10:41
things connect and communicate

01:10:43
back and forth. The agent reports

01:10:46
the metrics and requests what's

01:10:49
the next recommendation from the

01:10:52
optimizer that

01:10:53
runs in our backend.

01:10:56
The optimizer responds with a

01:10:58
recommendation, which is then

01:11:00
enabled in the system through the

01:11:03
agent. And then the agent also

01:11:06
starts to measure what's going on

01:11:08
on the machine before reporting

01:11:10
this metrics back to the backend.

01:11:12
And so this is a feedback loop and

01:11:14
the optimizer gets better and

01:11:16
better at predicting what's going

01:11:17
on on the other side.

01:11:19
So this is based on machine

01:11:21
learning technology and

01:11:22
specifically probabilistic models,

01:11:24
which I think is the interesting

01:11:26
part here. By using probabilistic

01:11:28
models, the system is able to

01:11:30
predict the performance for a new

01:11:33
guess, but also predict the

01:11:36
uncertainty around that

01:11:38
estimate. And that's, I think,

01:11:40
very powerful to be able to

01:11:41
combine some sort of prediction,

01:11:43
but also how confident you are

01:11:45
with respect to that prediction.

01:11:46
And those things are important

01:11:48
because when you're optimizing a

01:11:50
computer system, of course, you're

01:11:51
running this in production and you

01:11:53
want to make sure that this stays

01:11:57
safe for the

01:11:58
system that is running.

01:11:59
You're changing the system in real

01:12:00
time. So you want to make sure

01:12:02
that these things have done in a

01:12:04
safe way. And these models are

01:12:08
built in a way that they can take

01:12:10
into account all these

01:12:11
unpredictable things that may

01:12:14
otherwise book in the engineer system.

01:12:17
Right. And you mentioned earlier

01:12:19
that you're looking at the

01:12:21
pg_stat_statements

01:12:25
table, can't come up with

01:12:27
the name right now.

01:12:28
But that means you're not looking

01:12:30
at the actual data. So the data is

01:12:32
secure and it's not going to be

01:12:34
sent to your backend, which I

01:12:35
think could be a valid fear from a

01:12:38
lot of people like, okay, what is

01:12:40
actually being sent, right?

01:12:42
Exactly. So Chris, when we talk

01:12:44
with large telcos and big banks,

01:12:47
the first thing that they say,

01:12:49
what are you doing? What's the

01:12:51
data? So you need to sit down and

01:12:53
meet their infosec teams and

01:12:54
explain to them that we're not

01:12:56
transferring any of that data.

01:12:58
And it's literally just

01:13:00
telemetrics. And those telemetrics

01:13:03
usually are not sensitive in terms

01:13:04
of privacy and so on. And so

01:13:06
usually there is a meeting that

01:13:08
happens with their infosec teams,

01:13:10
especially for big banks and

01:13:11
telcos, where you clarify what is

01:13:14
being sent and then you send, they

01:13:16
look at the source code because

01:13:17
the agent is open source.

01:13:18
So you can look at the open source

01:13:20
and just realize that nothing

01:13:23
sensitive is being

01:13:24
sent to the internet.

01:13:26
Right.

01:13:27
And perhaps to add one more

01:13:29
element there. So for the most

01:13:32
conservative of our clients, we

01:13:34
also provide a way to deploy this

01:13:37
technology in a

01:13:38
completely offline manner.

01:13:40
So when everybody's of course

01:13:43
excited about digital

01:13:45
transformations and moving to the

01:13:47
cloud and so on, we actually went

01:13:48
kind of backwards and provided a

01:13:52
way of deploying this, which is

01:13:55
sending a standalone software that

01:13:57
runs in your environment and

01:14:00
doesn't communicate

01:14:00
at all to the internet.

01:14:01
So we have that as an option as

01:14:03
well for our users. And that

01:14:06
supports a little harder for us to

01:14:08
deploy because we don't have

01:14:09
direct access to that anymore. So

01:14:11
it's easy for us to deploy the

01:14:13
cloud-based version.

01:14:14
But if you, you know, in some

01:14:15
cases, you know, there is not very

01:14:17
much you can do that will not

01:14:19
allow you to go through the

01:14:20
internet. There are companies that

01:14:22
don't buy Salesforce for that

01:14:24
reason. So if you don't buy

01:14:25
Salesforce, you probably not buy

01:14:26
from anybody else on the planet.

01:14:28
So for those for those scenarios,

01:14:30
that's what we do.

01:14:34
Right. So how does it work

01:14:37
afterwards? So the machine

01:14:38
learning looks into the data,

01:14:40
tries to find patterns, has some

01:14:43
some optimization or some is... Is

01:14:47
it only queries or does it also

01:14:49
give me like recommendations on

01:14:50
how to optimize the Postgres

01:14:52
configuration itself?

01:14:53
And how does that present those? I

01:14:55
guess they're going

01:14:56
to be shown in the UI.

01:14:57
So we're specifically focusing on

01:14:59
on that aspect, the optimization

01:15:01
of the configuration of Postgres.

01:15:05
So that's that's our focus. And so

01:15:07
the things like if you're familiar

01:15:09
with Postgres, things like the

01:15:10
shared buffers, which is this

01:15:12
buffer, which contains the copy of

01:15:18
the data from tables from the disk

01:15:20
and keep it a

01:15:21
local copy on on RAM.

01:15:23
And that data is useful to keep it

01:15:27
warm in RAM, because when you

01:15:29
interact with the CPU, then you

01:15:30
don't need to go all the way back

01:15:31
to disk. And so if you go all the

01:15:33
way back to disk, there is an

01:15:34
order of magnitude more like delay

01:15:36
and latency and slow

01:15:39
down based on that.

01:15:40
So you try to keep the data close

01:15:41
to where it's processed. So trying

01:15:43
to keep the data in cash as much

01:15:45
as possible and share buffer is a

01:15:47
form of cash where the cash used

01:15:49
in this case is a piece of RAM.

01:15:50
And so sizing these shared

01:15:52
buffers, for example, is important

01:15:54
for performance. And then

01:15:56
there are a number of other things

01:15:58
similar to that, but slightly

01:16:00
different. For example, in

01:16:01
Postgres, there is an allocation

01:16:03
of a buffer for each query.

01:16:05
So each query has a buffer which

01:16:08
contains which can be used as an

01:16:11
operating memory for the query to

01:16:13
be processed. So if you're doing

01:16:14
some sort of like sorting, for

01:16:16
example, in the query that small

01:16:18
memory is used again.

01:16:20
And you want to keep that memory

01:16:22
close to the CPU and specifically

01:16:23
the the workman parameter, for

01:16:27
example, is what helps

01:16:29
with that specific thing.

01:16:30
And so we optimize all this, all

01:16:32
these things in a way that the

01:16:34
flow of data from disk to the

01:16:36
registers of the CPU, it's very,

01:16:38
very smooth and it's optimized. So

01:16:40
we optimize the locality of the

01:16:42
data, both spatial and temporal

01:16:46
locality if you want to use the

01:16:48
technical terms for that.

01:16:50
Right. Okay. So it doesn't help me

01:16:52
specifically with my stupid

01:16:53
queries. I still have to find a

01:16:55
consultant to fix that or find

01:16:56
somebody else in the team.

01:16:57
Yeah, for now, that's

01:16:59
correct. What, you know, we will

01:17:02
probably focus on that in the

01:17:04
future. But for now, the way you

01:17:06
usually optimize your queries is

01:17:08
that you optimize your queries and

01:17:09
then if you want to see what's

01:17:11
the actual benefit, you should

01:17:13
also optimize your parameters.

01:17:15
And so if you want to do it really

01:17:16
well, you should optimize your

01:17:17
queries, then you go optimize your

01:17:18
parameters and go back optimize

01:17:20
again your queries parameters and

01:17:21
kind of converge

01:17:22
into this process.

01:17:23
So now that one of the two is

01:17:25
fully automated, you can focus on

01:17:27
the queries and, you know,

01:17:30
speed up the process of optimizing

01:17:32
the queries by a large margin.

01:17:34
So to in terms of like benefits,

01:17:36
of course, if you optimize your

01:17:37
queries, you will write your

01:17:38
queries, you can get, you know,

01:17:40
two or three order of magnitude

01:17:41
performance improvement, which is

01:17:43
which is really, really great.

01:17:44
If you optimize the configuration

01:17:46
of your system, you can get, you

01:17:48
know, an order of magnitude in

01:17:49
terms of performance improvement.

01:17:51
And that's, that's still very,

01:17:52
very significant.

01:17:54
Despite of what many people say

01:17:57
it's possible to get an order of

01:17:59
magnitude improvement in

01:18:00
performance. If your system by

01:18:02
baseline, it's fairly, it's fairly

01:18:04
basic, let's say.

01:18:05
And the interesting fact is that

01:18:08
by the nature of Postgres, for

01:18:13
example, the default configuration

01:18:14
of Postgres needs to be pretty

01:18:16
conservative because Postgres

01:18:18
needs to be able to run on big

01:18:19
server machines, but also on

01:18:20
smaller machines.

01:18:22
So the form factor needs to be

01:18:23
taken into account when you define

01:18:25
the default configuration of

01:18:27
Postgres. And so by that fact,

01:18:29
it needs to be

01:18:30
pretty conservative.

01:18:31
And so what you can what you can

01:18:34
observe in out there is that this

01:18:36
problem is so complex that people

01:18:37
don't really change the default

01:18:39
configuration of Postgres when

01:18:40
they run on a

01:18:41
much bigger instance.

01:18:42
And so there is a lot of

01:18:43
performance improvement that can

01:18:44
be obtained by changing that that

01:18:48
configuration to a better suited

01:18:50
configuration. And you have the

01:18:51
point of doing this through

01:18:52
automation and through things like

01:18:54
DBtune is that you can then refine

01:18:58
the configuration of your system

01:18:59
specifically for

01:19:00
the specific use case that you

01:19:04
have, like your application, your

01:19:05
workload, the machine size, and

01:19:07
all these things are considered

01:19:08
together to give you the best

01:19:10
outcome for your use case,

01:19:13
which is, I think, the new part,

01:19:14
the novelty of

01:19:15
this approach, right?

01:19:16
Because if you're doing this

01:19:18
through some sort of heuristics,

01:19:20
they usually don't really get to

01:19:22
cover all these different things.

01:19:24
And there will always be kind of

01:19:26
super respect to what you can do

01:19:28
with an observability loop, right?

01:19:32
Yeah, and I think you mentioned

01:19:34
that a lot of people don't touch

01:19:35
the configuration. I think there

01:19:37
is the problem that the

01:19:40
Postgres

01:19:41
configuration is very complex.

01:19:44
A lot of parameters depend on each

01:19:45
other. And it's, I mean, I'm

01:19:48
coming from a Java background, and

01:19:49
we have the same thing with

01:19:51
garbage collectors. Optimizing a

01:19:53
garbage collector, for

01:19:54
every single algorithm you have

01:19:55
like 20 or 30 parameters, all of

01:19:58
them depend on each other.

01:19:59
Changing one may completely

01:20:01
disrupt all the other ones. And I

01:20:03
think that is what a lot of people

01:20:05
kind of like, fear away from.

01:20:07
And then you Google, and then

01:20:10
there's like the big Postgres

01:20:11
community telling you, "no, you

01:20:13
really don't want to change that

01:20:14
parameter until you really know

01:20:16
what you're doing," and you don't

01:20:18
know, so you leave it alone.

01:20:20
So in this case, I think something

01:20:21
like Dbtune will be or is

01:20:23
absolutely amazing.

01:20:24
Exactly. And, you know, if you if

01:20:27
you spend some time on blog posts

01:20:29
learning about the Postgres

01:20:31
parameters you get that type of

01:20:33
feedback and takes a lot of time

01:20:34
to learn it in a way that you can

01:20:35
feel confident and comfortable in

01:20:38
changes in your production system,

01:20:39
especially if you're working in a

01:20:40
big corporation.

01:20:41
And the idea here is that at

01:20:44
DBtune we are partnered with

01:20:46
leading Postgres experts as well.

01:20:49
Magnus Hagander, for example, we

01:20:50
see present of the Postgres Europe

01:20:53
organization, for example, it's

01:20:55
been doing this manual tuning for

01:20:57
about two decades and we worked

01:20:59
very closely with him to be able

01:21:00
to really do this in also in a

01:21:02
very safe manner, right.

01:21:04
You should basically trust our

01:21:06
system to be doing the right thing

01:21:08
because it's engineering a way

01:21:10
that incorporates a lot of domain

01:21:11
expertise so it's not just machine

01:21:13
learning it's also about the

01:21:15
specific Postgres domain expertise

01:21:17
that you need to do

01:21:18
this well and safely.

01:21:21
Oh, cool. All right.

01:21:22
We're almost out of time. Last

01:21:25
question. What do you think it's like the

01:21:27
next big thing in Postgres and

01:21:29
databases, in cloud, in db tuning.

01:21:34
That's a huge question. So we've

01:21:38
seen all sorts of things happening

01:21:39
recently with, of course, AI stuff

01:21:42
but, you know, I think it's, it's

01:21:44
too simple to talk about that once

01:21:46
more I think you guys covered

01:21:47
those type of topics a lot.

01:21:48
I think what's interesting is that

01:21:50
there is there is a lot that has

01:21:52
been done to support those type of

01:21:55
models and using for example the

01:21:57
rise of vector databases for

01:21:58
example, which was I think quite

01:22:01
interesting vector databases like

01:22:02
for example the extension for

01:22:04
Postgres, the pgvector was around

01:22:06
for a little while but in last

01:22:08
year you really saw a huge

01:22:10
adoption and that's driven by all

01:22:13
sort of large language models that

01:22:14
use this vector embeddings and

01:22:16
that's I think a trend that will

01:22:17
will see for for a little while.

01:22:20
For example, our lead investor

01:22:23
42CAP they recently invest in a

01:22:25
in another company that does this

01:22:29
type of things as well quadrant

01:22:30
for example, and there are a

01:22:32
number of companies that focus on

01:22:34
that Milvus and Chroma, Zillis,

01:22:37
you know, there are a number of

01:22:38
companies, pg_vectorize as well by

01:22:41
the Tembo friends.

01:22:42
So this is certainly a trend that

01:22:45
will stay and for a fairly

01:22:47
long time. In terms of database

01:22:49
systems, I am personally very

01:22:52
excited about the huge shift left

01:22:54
that is happening in the industry.

01:22:56
Shift left the meaning all the

01:22:58
databases of service, you know,

01:23:02
from from Azure flexible server

01:23:03
Amazon RDS, Google Cloud SQL,

01:23:06
those are the big ones, but there

01:23:07
are a number of other companies

01:23:08
that are doing the same and

01:23:10
they're very interesting ideas

01:23:12
things that are really, you know,

01:23:15
shaping that whole area, so I

01:23:17
can mention a few for example,

01:23:18
Tembo, even EnterpriseDB and so

01:23:22
on that there's so much going on

01:23:24
in that space and in some sort of

01:23:26
the DBtune is really in that

01:23:28
specific direction, right? So

01:23:30
helping to automate more and more

01:23:32
of what you need to do in a

01:23:35
database when you're operating at

01:23:36
database. From a machine learning

01:23:38
perspective, and then I will I

01:23:39
will stop that Chris, I think

01:23:40
we're running out of time, from

01:23:42
machine learning perspective.

01:23:42
I'm really interested in and

01:23:45
that's something that we've been

01:23:46
studying for a few years now in my

01:23:48
academic team, with my PhD

01:23:50
students. The, you know, pushing

01:23:53
pushing the boundaries of what we

01:23:54
can do in terms of using machine

01:23:56
learning for computer systems and

01:23:58
specifically when you get computer

01:24:01
systems that have hundreds, if not

01:24:03
thousands of parameters and

01:24:06
variables to be optimized at the

01:24:07
same time jointly.

01:24:08
And we have recently

01:24:11
published a few pieces of work

01:24:13
that you can find on my

01:24:14
Google Scholar on that specific

01:24:15
topic. So it's a little math-y, you

01:24:17
know, it's a little hard to maybe

01:24:19
read them parts, but it's quite

01:24:20
rewarding to see that this new

01:24:22
pieces of technology are

01:24:24
becoming available to

01:24:26
practitioners and people that work

01:24:28
on applications as well.

01:24:29
So that perhaps the attention will

01:24:32
move away at some point from full

01:24:35
LLMs to also other areas in

01:24:38
machine learning and AI that are

01:24:39
also equally

01:24:40
interesting in my opinion.

01:24:42
Perfect. That's, that's beautiful.

01:24:45
Just send me the link. I'm happy

01:24:47
to put it into the show note. I

01:24:48
bet there's quite a few people

01:24:50
that would be really really into

01:24:52
reading those things. I'm not big

01:24:54
on mathematics that's probably way

01:24:55
over my head, but

01:24:56
that's, that's fine.

01:24:58
Yeah, I was that was a pleasure.

01:25:01
Thank you for being here.

01:25:04
And I hope we. Yeah, I hope we see

01:25:07
each other somewhere at a Postgres

01:25:09
conference we just briefly talked

01:25:10
about that before before the

01:25:12
recording started. So yeah, thank

01:25:14
you for being here.

01:25:16
And for the audience.

01:25:18
I see you.

01:25:20
I hear you next week or you hear

01:25:22
me next week with the next

01:25:23
episode. And thank you

01:25:25
for being here as well.

01:25:26
Awesome for the audience will be

01:25:28
at the Postgres Switzerland

01:25:30
conference sponsors and we will be

01:25:32
giving talks there so if you come

01:25:35
by, feel free to say hi, and we

01:25:38
can grab coffee together.

01:25:39
Thank you very much. Perfect. Yes.

01:25:41
Thank you. Bye bye.

01:25:44
The cloud commute podcast is sponsored by

01:25:46
simplyblock your own elastic

01:25:47
block storage engine for the cloud.

01:25:49
Get higher IOPS and low predictable

01:25:51
latency while bringing down your

01:25:52
total cost of ownership.

01:25:54
www.simplyblock.io

Machine Learning driven Database Optimization with Luigi Nardi from DBtune

Cloud Frontier

Cloud Commute