In this week's episode of Simplyblock's Cloud Commute podcast, host Chris Engelbert interviews Luigi Nardi, founder and CEO of DBtune. Luigi provides insights into the complexities of database optimization and the significant role of machine learning in improving system efficiency.
In this episode of Cloud Commute, Chris and Luigi discuss:
- Introduction to DBtune: Machine learning for database optimization
- Postgres tuning: Automating configuration for performance and sustainability
- Security and privacy concerns in database optimization and how DBtune addresses them
- Future trends in databases: Vector databases, machine learning, and database-as-a-service (DBaaS)
Interested to learn more about the cloud infrastructure stack like storage, security, and Kubernetes? Head to our website (www.simplyblock.io/cloud-commute-podcast) for more episodes, and follow us on LinkedIn (www.linkedin.com/company/simplyblock-io). You can also check out the detailed show notes on Youtube (www.youtube.com/watch?v=lkquofjg9zo).
You can find Luigi Nardi on X @luiginardi and Linkedin: /nardiluigi.
About simplyblock:
Simplyblock is an intelligent database storage orchestrator for IO-intensive workloads in Kubernetes, including databases and analytics solutions. It uses smart NVMe caching to speed up read I/O latency and queries. A single system connects local NVMe disks, GP3 volumes, and S3 making it easier to handle storage capacity and performance. With the benefits of thin provisioning, storage tiering, and volume pooling, your database workloads get better performance at lower cost without changes to existing AWS infrastructure.
👉 Get started with simplyblock: https://www.simplyblock.io/buy-now
🏪 simplyblock AWS Marketplace: https://aws.amazon.com/marketplace/seller-profile?id=seller-fzdtuccq3edzm
01:00:00
So in terms of like benefits,
01:00:02
of course, if you optimize your
01:00:03
queries, you will write your
01:00:04
queries, you can get, you know,
01:00:06
two or three order of magnitude
01:00:07
performance improvement, which is
01:00:09
which is really, really great.
01:00:10
If you optimize the configuration
01:00:12
of your system, you can get, you
01:00:14
know, an order of magnitude in
01:00:15
terms of performance improvement.
01:00:17
And that's, that's still very,
01:00:18
very significant.
01:00:23
You're listening to simplyblock's Cloud Commute Podcast,
01:00:26
your weekly 20 minute
01:00:27
podcast about cloud technologies,
01:00:29
Kubernetes, security,
01:00:30
sustainability, and more.
01:00:32
Hello, everyone. Welcome back to
01:00:33
this week's episode
01:00:34
of simplyblock's Cloud
01:00:35
Community podcast. This week I
01:00:37
have Luigi with me. Luigi,
01:00:41
obviously, from Italy.
01:00:44
I don't think he has anything to
01:00:45
do with Super Mario,
01:00:46
but he can tell us about
01:00:48
that himself. So welcome, Luigi.
01:00:51
Sorry for the really bad joke.
01:00:54
Glad to be here, Chris.
01:00:56
So maybe you start with
01:00:59
introducing yourself. Who are you?
01:01:02
We already know where you're from,
01:01:03
but I'm not sure if you're
01:01:04
actually residing in Italy. So
01:01:06
maybe just tell us a
01:01:07
little bit about you.
01:01:08
Sure. Yes, I'm originally Italian.
01:01:10
I left the country to explore and
01:01:14
study abroad a little while ago.
01:01:17
So in 2006, I moved to France and
01:01:21
started there for a little while.
01:01:24
I spent almost seven years in
01:01:25
total in France eventually. I did
01:01:27
my PhD program there in Paris and
01:01:31
worked in a company as a software
01:01:33
engineer as well.
01:01:36
So moved to the UK for a few
01:01:39
years, did a postdoc at the
01:01:42
Imperial College London in
01:01:43
downtown London,
01:01:44
and then moved to the US. So lived
01:01:47
in California, Palo Alto more
01:01:49
precisely for a few years.
01:01:51
And then in 2019, I came back to
01:01:53
Europe and established my
01:01:55
residency in Sweden.
01:01:59
Right. Okay. So you're in Sweden
01:02:01
right now.
01:02:02
That's correct.
01:02:04
Oh, nice. Nice. How's
01:02:06
weather? Is it still cold?
01:02:08
It's great. Everybody thinks that
01:02:09
Sweden has very bad weather, but
01:02:11
Sweden is a very,
01:02:12
very long country.
01:02:13
So if you reside in the south,
01:02:16
actually the weather is pretty,
01:02:17
pretty decent. It
01:02:18
doesn't snow very much.
01:02:20
That is very
01:02:21
true. I'll actually love
01:02:23
Stockholm, a very beautiful city.
01:02:25
All right. One thing you haven't
01:02:27
mentioned, you're actually the
01:02:28
founder and CEO of DBtune.
01:02:32
So you left out the best part, I
01:02:34
guess. Maybe tell us a little bit
01:02:36
about DBtune now.
01:02:39
Sure. DBtune is a company that is
01:02:41
about four years old now. It's a
01:02:43
spinoff from Stanford University.
01:02:46
And the commercialization of about
01:02:50
a decade of research and
01:02:51
development in academia.
01:02:54
So we were working on the
01:02:56
intersection between machine
01:02:57
learning and computer systems
01:02:59
and specifically the use of
01:03:01
machine learning to optimize
01:03:02
computer systems.
01:03:03
This is an area that in around
01:03:05
2018 or 2019 received a new name,
01:03:09
which is MLSys machine
01:03:10
learning and systems.
01:03:12
And so this new area is quite
01:03:15
prominent these days. And
01:03:16
you can do very beautiful things
01:03:19
with the combination
01:03:20
of these two pieces.
01:03:24
So DbTune is specifically focusing
01:03:26
on using machine learning to
01:03:28
optimize computer systems,
01:03:30
specifically in the
01:03:31
computer system area.
01:03:33
We are optimizing databases, the
01:03:34
base management
01:03:35
systems more specifically.
01:03:37
And the idea is that you can
01:03:40
automate the process of tuning
01:03:44
databases. At least we are
01:03:46
focusing on the optimization of
01:03:48
the parameters of the database
01:03:50
management systems, the parameters
01:03:52
that govern the runtime system.
01:03:53
Meaning the way the disk, the RAM
01:03:57
and the CPU interact with each
01:04:00
other. So you take the von Neumann
01:04:02
model and you try to make it as
01:04:04
efficient as possible through
01:04:06
optimizing the parameters that
01:04:10
govern that interaction.
01:04:12
And so by doing that you automate
01:04:14
the process, which means that
01:04:16
database engineers, database
01:04:18
administrators can focus on other
01:04:21
tasks that are equally important
01:04:24
or even more important.
01:04:26
And at the same time you get great
01:04:27
performance, you can reduce your
01:04:30
cloud costs as well. If you're
01:04:32
running in the cloud in an
01:04:33
efficient way you can
01:04:35
optimize the cloud costs.
01:04:36
And at the same time you get also
01:04:38
a check on your greenops, meaning
01:04:42
sustainability aspect of it. So
01:04:44
this is one of the examples I
01:04:46
really like of how you can be an
01:04:48
engineer and provide quite a big
01:04:50
contribution in terms of
01:04:52
sustainability as well because you
01:04:54
can connect these two things by
01:04:56
making your software run more
01:04:57
efficiently and then scaling down
01:04:59
your operations.
01:05:02
That is true. And it's, yeah, I've
01:05:05
never thought about that, but
01:05:06
sure. I mean, if I get my queries
01:05:08
to run more efficient and use less
01:05:11
compute time and compute power,
01:05:13
huh, that is actually a good
01:05:15
thing. Now I'm
01:05:16
feeling much better.
01:05:20
I'm feeling much better too. Since
01:05:22
we started talking a little bit
01:05:25
more about this, we have a blog
01:05:28
post that will be released pretty
01:05:29
soon about this
01:05:30
very specific topic.
01:05:31
I think this connection between
01:05:33
making software run efficiently
01:05:36
and the downstream effects of that
01:05:42
efficiency, both on your cost,
01:05:44
infrastructure cost, but also on
01:05:46
the efficiency of your operations.
01:05:49
It's often
01:05:49
underestimated, I would say.
01:05:54
Yeah, that's fair. It would be
01:05:57
nice if you, when it's published,
01:05:59
just send me over the link and I'm
01:06:01
putting it into the show notes
01:06:03
because I think that will be
01:06:04
really interesting
01:06:05
to a lot of people.
01:06:06
As he said specifically for
01:06:08
developers that would otherwise
01:06:10
have a hard time having anything
01:06:13
in terms of sustainability.
01:06:16
You mentioned database systems,
01:06:19
but I think DBtune specifically is
01:06:21
focused on Postgres, isn't it?
01:06:25
Right. Today we are focusing on
01:06:26
Postgres. As a proof of concept,
01:06:29
though, we have applied similar
01:06:31
technology to five different
01:06:33
database management systems,
01:06:35
including relational and
01:06:36
non-relational systems as well.
01:06:37
So we were, a little while ago, we
01:06:40
wanted to show that this
01:06:41
technology can be used across the
01:06:44
board. And so we play around with
01:06:47
MySQL, with FoundationDB, which is
01:06:50
the system behind iCloud, for
01:06:54
example, and many of
01:06:56
the VMware products.
01:06:58
And then we have RocksDB, which is
01:06:59
behind your Instagram and Facebook
01:07:01
and so on. Facebook, very
01:07:03
pressing that open
01:07:04
source storage system.
01:07:08
And things like SAP HANA as well,
01:07:10
we've been focusing on that a
01:07:12
little bit as well, just as a
01:07:12
proof of concept to show that
01:07:14
basically the same methodology can
01:07:16
apply to very different database
01:07:19
management systems in general.
01:07:21
Right. You want to look into
01:07:24
Oracle and take a chunk of their
01:07:26
money, I guess. But you're on the
01:07:29
right track with SAP HANA. It's
01:07:31
kind of on the same level.
01:07:33
So how does that work? I think you
01:07:36
have to have some kind of an agent
01:07:38
inside of your database. For
01:07:40
Postgres, you're probably using
01:07:41
the stats tables, but I guess
01:07:43
you're doing more, right?
01:07:46
Right. This is the idea of, you
01:07:49
know, observability and monitoring
01:07:50
companies. They mainly focus on
01:07:52
gathering all this metrics from
01:07:55
the machine and then getting you a
01:07:59
very nice
01:07:59
visualization on your dashboard.
01:08:01
As a user, you would look at these
01:08:04
metrics and how they evolve over
01:08:07
time, and then they help you guide
01:08:10
the next step, which is some sort
01:08:12
of manual
01:08:12
optimization of your system.
01:08:15
We are moving one step forward and
01:08:17
we're trying to use those metrics
01:08:20
automatically instead of just
01:08:22
giving them back to the user.
01:08:22
So we move from a passive
01:08:24
monitoring approach to an active
01:08:26
approach where the metrics are
01:08:29
collected and then the algorithm
01:08:31
will help you also to
01:08:33
automatically change the
01:08:35
configuration of the system in a
01:08:37
way that it gets faster over time.
01:08:41
And so the metrics that we look at
01:08:43
usually are, well, the algorithm
01:08:46
itself will gather a number of
01:08:48
metrics to help it to improve over
01:08:50
time. And this type of metrics are
01:08:52
related to, you know, your system
01:08:54
usage, you know, CPU
01:08:56
memory and disk usage.
01:08:59
And other things, for example,
01:09:02
latency and throughput as well
01:09:04
from your Postgres database
01:09:06
management system. So using things
01:09:08
like pg_stat_statements, for
01:09:09
example, for people that are a
01:09:11
little more
01:09:12
familiar with Postgres.
01:09:14
And by design, we refrain from
01:09:17
looking inside your tables or
01:09:18
looking specifically at your
01:09:20
metadata, at your queries, for
01:09:22
example, we refrain from that
01:09:24
because it's easier to basically,
01:09:28
you know, deploy our system in a
01:09:32
way that it's not dangerous for
01:09:34
your data and for your privacy
01:09:36
concerns and things like that.
01:09:38
Right. Okay. And then you send
01:09:41
that to a cloud instance that
01:09:43
visualizes the data, just the
01:09:45
simple stuff, but there's also
01:09:47
machine learning that actually
01:09:49
looks at all the collected data
01:09:51
and I guess try to find pattern.
01:09:53
And how does that work? I mean,
01:09:58
you probably have a version of the
01:10:00
query parser, the Postgres query
01:10:02
parser in the backend to actually
01:10:04
make sense of this information,
01:10:07
see what the
01:10:07
execution plan would be.
01:10:09
That is just me guessing. I don't
01:10:11
want to spoil your product.
01:10:13
No, that's okay. So the agent is
01:10:18
open source and it gets installed
01:10:21
on your environment. And anyone
01:10:25
fluent in Python can read that in
01:10:27
probably 20 minutes. So it's
01:10:28
pretty, it's not
01:10:29
massive. It's not very big.
01:10:31
That's what gets connected with
01:10:33
our backend system, which is
01:10:37
running in our cloud. And the two
01:10:41
things connect and communicate
01:10:43
back and forth. The agent reports
01:10:46
the metrics and requests what's
01:10:49
the next recommendation from the
01:10:52
optimizer that
01:10:53
runs in our backend.
01:10:56
The optimizer responds with a
01:10:58
recommendation, which is then
01:11:00
enabled in the system through the
01:11:03
agent. And then the agent also
01:11:06
starts to measure what's going on
01:11:08
on the machine before reporting
01:11:10
this metrics back to the backend.
01:11:12
And so this is a feedback loop and
01:11:14
the optimizer gets better and
01:11:16
better at predicting what's going
01:11:17
on on the other side.
01:11:19
So this is based on machine
01:11:21
learning technology and
01:11:22
specifically probabilistic models,
01:11:24
which I think is the interesting
01:11:26
part here. By using probabilistic
01:11:28
models, the system is able to
01:11:30
predict the performance for a new
01:11:33
guess, but also predict the
01:11:36
uncertainty around that
01:11:38
estimate. And that's, I think,
01:11:40
very powerful to be able to
01:11:41
combine some sort of prediction,
01:11:43
but also how confident you are
01:11:45
with respect to that prediction.
01:11:46
And those things are important
01:11:48
because when you're optimizing a
01:11:50
computer system, of course, you're
01:11:51
running this in production and you
01:11:53
want to make sure that this stays
01:11:57
safe for the
01:11:58
system that is running.
01:11:59
You're changing the system in real
01:12:00
time. So you want to make sure
01:12:02
that these things have done in a
01:12:04
safe way. And these models are
01:12:08
built in a way that they can take
01:12:10
into account all these
01:12:11
unpredictable things that may
01:12:14
otherwise book in the engineer system.
01:12:17
Right. And you mentioned earlier
01:12:19
that you're looking at the
01:12:21
pg_stat_statements
01:12:25
table, can't come up with
01:12:27
the name right now.
01:12:28
But that means you're not looking
01:12:30
at the actual data. So the data is
01:12:32
secure and it's not going to be
01:12:34
sent to your backend, which I
01:12:35
think could be a valid fear from a
01:12:38
lot of people like, okay, what is
01:12:40
actually being sent, right?
01:12:42
Exactly. So Chris, when we talk
01:12:44
with large telcos and big banks,
01:12:47
the first thing that they say,
01:12:49
what are you doing? What's the
01:12:51
data? So you need to sit down and
01:12:53
meet their infosec teams and
01:12:54
explain to them that we're not
01:12:56
transferring any of that data.
01:12:58
And it's literally just
01:13:00
telemetrics. And those telemetrics
01:13:03
usually are not sensitive in terms
01:13:04
of privacy and so on. And so
01:13:06
usually there is a meeting that
01:13:08
happens with their infosec teams,
01:13:10
especially for big banks and
01:13:11
telcos, where you clarify what is
01:13:14
being sent and then you send, they
01:13:16
look at the source code because
01:13:17
the agent is open source.
01:13:18
So you can look at the open source
01:13:20
and just realize that nothing
01:13:23
sensitive is being
01:13:24
sent to the internet.
01:13:26
Right.
01:13:27
And perhaps to add one more
01:13:29
element there. So for the most
01:13:32
conservative of our clients, we
01:13:34
also provide a way to deploy this
01:13:37
technology in a
01:13:38
completely offline manner.
01:13:40
So when everybody's of course
01:13:43
excited about digital
01:13:45
transformations and moving to the
01:13:47
cloud and so on, we actually went
01:13:48
kind of backwards and provided a
01:13:52
way of deploying this, which is
01:13:55
sending a standalone software that
01:13:57
runs in your environment and
01:14:00
doesn't communicate
01:14:00
at all to the internet.
01:14:01
So we have that as an option as
01:14:03
well for our users. And that
01:14:06
supports a little harder for us to
01:14:08
deploy because we don't have
01:14:09
direct access to that anymore. So
01:14:11
it's easy for us to deploy the
01:14:13
cloud-based version.
01:14:14
But if you, you know, in some
01:14:15
cases, you know, there is not very
01:14:17
much you can do that will not
01:14:19
allow you to go through the
01:14:20
internet. There are companies that
01:14:22
don't buy Salesforce for that
01:14:24
reason. So if you don't buy
01:14:25
Salesforce, you probably not buy
01:14:26
from anybody else on the planet.
01:14:28
So for those for those scenarios,
01:14:30
that's what we do.
01:14:34
Right. So how does it work
01:14:37
afterwards? So the machine
01:14:38
learning looks into the data,
01:14:40
tries to find patterns, has some
01:14:43
some optimization or some is... Is
01:14:47
it only queries or does it also
01:14:49
give me like recommendations on
01:14:50
how to optimize the Postgres
01:14:52
configuration itself?
01:14:53
And how does that present those? I
01:14:55
guess they're going
01:14:56
to be shown in the UI.
01:14:57
So we're specifically focusing on
01:14:59
on that aspect, the optimization
01:15:01
of the configuration of Postgres.
01:15:05
So that's that's our focus. And so
01:15:07
the things like if you're familiar
01:15:09
with Postgres, things like the
01:15:10
shared buffers, which is this
01:15:12
buffer, which contains the copy of
01:15:18
the data from tables from the disk
01:15:20
and keep it a
01:15:21
local copy on on RAM.
01:15:23
And that data is useful to keep it
01:15:27
warm in RAM, because when you
01:15:29
interact with the CPU, then you
01:15:30
don't need to go all the way back
01:15:31
to disk. And so if you go all the
01:15:33
way back to disk, there is an
01:15:34
order of magnitude more like delay
01:15:36
and latency and slow
01:15:39
down based on that.
01:15:40
So you try to keep the data close
01:15:41
to where it's processed. So trying
01:15:43
to keep the data in cash as much
01:15:45
as possible and share buffer is a
01:15:47
form of cash where the cash used
01:15:49
in this case is a piece of RAM.
01:15:50
And so sizing these shared
01:15:52
buffers, for example, is important
01:15:54
for performance. And then
01:15:56
there are a number of other things
01:15:58
similar to that, but slightly
01:16:00
different. For example, in
01:16:01
Postgres, there is an allocation
01:16:03
of a buffer for each query.
01:16:05
So each query has a buffer which
01:16:08
contains which can be used as an
01:16:11
operating memory for the query to
01:16:13
be processed. So if you're doing
01:16:14
some sort of like sorting, for
01:16:16
example, in the query that small
01:16:18
memory is used again.
01:16:20
And you want to keep that memory
01:16:22
close to the CPU and specifically
01:16:23
the the workman parameter, for
01:16:27
example, is what helps
01:16:29
with that specific thing.
01:16:30
And so we optimize all this, all
01:16:32
these things in a way that the
01:16:34
flow of data from disk to the
01:16:36
registers of the CPU, it's very,
01:16:38
very smooth and it's optimized. So
01:16:40
we optimize the locality of the
01:16:42
data, both spatial and temporal
01:16:46
locality if you want to use the
01:16:48
technical terms for that.
01:16:50
Right. Okay. So it doesn't help me
01:16:52
specifically with my stupid
01:16:53
queries. I still have to find a
01:16:55
consultant to fix that or find
01:16:56
somebody else in the team.
01:16:57
Yeah, for now, that's
01:16:59
correct. What, you know, we will
01:17:02
probably focus on that in the
01:17:04
future. But for now, the way you
01:17:06
usually optimize your queries is
01:17:08
that you optimize your queries and
01:17:09
then if you want to see what's
01:17:11
the actual benefit, you should
01:17:13
also optimize your parameters.
01:17:15
And so if you want to do it really
01:17:16
well, you should optimize your
01:17:17
queries, then you go optimize your
01:17:18
parameters and go back optimize
01:17:20
again your queries parameters and
01:17:21
kind of converge
01:17:22
into this process.
01:17:23
So now that one of the two is
01:17:25
fully automated, you can focus on
01:17:27
the queries and, you know,
01:17:30
speed up the process of optimizing
01:17:32
the queries by a large margin.
01:17:34
So to in terms of like benefits,
01:17:36
of course, if you optimize your
01:17:37
queries, you will write your
01:17:38
queries, you can get, you know,
01:17:40
two or three order of magnitude
01:17:41
performance improvement, which is
01:17:43
which is really, really great.
01:17:44
If you optimize the configuration
01:17:46
of your system, you can get, you
01:17:48
know, an order of magnitude in
01:17:49
terms of performance improvement.
01:17:51
And that's, that's still very,
01:17:52
very significant.
01:17:54
Despite of what many people say
01:17:57
it's possible to get an order of
01:17:59
magnitude improvement in
01:18:00
performance. If your system by
01:18:02
baseline, it's fairly, it's fairly
01:18:04
basic, let's say.
01:18:05
And the interesting fact is that
01:18:08
by the nature of Postgres, for
01:18:13
example, the default configuration
01:18:14
of Postgres needs to be pretty
01:18:16
conservative because Postgres
01:18:18
needs to be able to run on big
01:18:19
server machines, but also on
01:18:20
smaller machines.
01:18:22
So the form factor needs to be
01:18:23
taken into account when you define
01:18:25
the default configuration of
01:18:27
Postgres. And so by that fact,
01:18:29
it needs to be
01:18:30
pretty conservative.
01:18:31
And so what you can what you can
01:18:34
observe in out there is that this
01:18:36
problem is so complex that people
01:18:37
don't really change the default
01:18:39
configuration of Postgres when
01:18:40
they run on a
01:18:41
much bigger instance.
01:18:42
And so there is a lot of
01:18:43
performance improvement that can
01:18:44
be obtained by changing that that
01:18:48
configuration to a better suited
01:18:50
configuration. And you have the
01:18:51
point of doing this through
01:18:52
automation and through things like
01:18:54
DBtune is that you can then refine
01:18:58
the configuration of your system
01:18:59
specifically for
01:19:00
the specific use case that you
01:19:04
have, like your application, your
01:19:05
workload, the machine size, and
01:19:07
all these things are considered
01:19:08
together to give you the best
01:19:10
outcome for your use case,
01:19:13
which is, I think, the new part,
01:19:14
the novelty of
01:19:15
this approach, right?
01:19:16
Because if you're doing this
01:19:18
through some sort of heuristics,
01:19:20
they usually don't really get to
01:19:22
cover all these different things.
01:19:24
And there will always be kind of
01:19:26
super respect to what you can do
01:19:28
with an observability loop, right?
01:19:32
Yeah, and I think you mentioned
01:19:34
that a lot of people don't touch
01:19:35
the configuration. I think there
01:19:37
is the problem that the
01:19:40
Postgres
01:19:41
configuration is very complex.
01:19:44
A lot of parameters depend on each
01:19:45
other. And it's, I mean, I'm
01:19:48
coming from a Java background, and
01:19:49
we have the same thing with
01:19:51
garbage collectors. Optimizing a
01:19:53
garbage collector, for
01:19:54
every single algorithm you have
01:19:55
like 20 or 30 parameters, all of
01:19:58
them depend on each other.
01:19:59
Changing one may completely
01:20:01
disrupt all the other ones. And I
01:20:03
think that is what a lot of people
01:20:05
kind of like, fear away from.
01:20:07
And then you Google, and then
01:20:10
there's like the big Postgres
01:20:11
community telling you, "no, you
01:20:13
really don't want to change that
01:20:14
parameter until you really know
01:20:16
what you're doing," and you don't
01:20:18
know, so you leave it alone.
01:20:20
So in this case, I think something
01:20:21
like Dbtune will be or is
01:20:23
absolutely amazing.
01:20:24
Exactly. And, you know, if you if
01:20:27
you spend some time on blog posts
01:20:29
learning about the Postgres
01:20:31
parameters you get that type of
01:20:33
feedback and takes a lot of time
01:20:34
to learn it in a way that you can
01:20:35
feel confident and comfortable in
01:20:38
changes in your production system,
01:20:39
especially if you're working in a
01:20:40
big corporation.
01:20:41
And the idea here is that at
01:20:44
DBtune we are partnered with
01:20:46
leading Postgres experts as well.
01:20:49
Magnus Hagander, for example, we
01:20:50
see present of the Postgres Europe
01:20:53
organization, for example, it's
01:20:55
been doing this manual tuning for
01:20:57
about two decades and we worked
01:20:59
very closely with him to be able
01:21:00
to really do this in also in a
01:21:02
very safe manner, right.
01:21:04
You should basically trust our
01:21:06
system to be doing the right thing
01:21:08
because it's engineering a way
01:21:10
that incorporates a lot of domain
01:21:11
expertise so it's not just machine
01:21:13
learning it's also about the
01:21:15
specific Postgres domain expertise
01:21:17
that you need to do
01:21:18
this well and safely.
01:21:21
Oh, cool. All right.
01:21:22
We're almost out of time. Last
01:21:25
question. What do you think it's like the
01:21:27
next big thing in Postgres and
01:21:29
databases, in cloud, in db tuning.
01:21:34
That's a huge question. So we've
01:21:38
seen all sorts of things happening
01:21:39
recently with, of course, AI stuff
01:21:42
but, you know, I think it's, it's
01:21:44
too simple to talk about that once
01:21:46
more I think you guys covered
01:21:47
those type of topics a lot.
01:21:48
I think what's interesting is that
01:21:50
there is there is a lot that has
01:21:52
been done to support those type of
01:21:55
models and using for example the
01:21:57
rise of vector databases for
01:21:58
example, which was I think quite
01:22:01
interesting vector databases like
01:22:02
for example the extension for
01:22:04
Postgres, the pgvector was around
01:22:06
for a little while but in last
01:22:08
year you really saw a huge
01:22:10
adoption and that's driven by all
01:22:13
sort of large language models that
01:22:14
use this vector embeddings and
01:22:16
that's I think a trend that will
01:22:17
will see for for a little while.
01:22:20
For example, our lead investor
01:22:23
42CAP they recently invest in a
01:22:25
in another company that does this
01:22:29
type of things as well quadrant
01:22:30
for example, and there are a
01:22:32
number of companies that focus on
01:22:34
that Milvus and Chroma, Zillis,
01:22:37
you know, there are a number of
01:22:38
companies, pg_vectorize as well by
01:22:41
the Tembo friends.
01:22:42
So this is certainly a trend that
01:22:45
will stay and for a fairly
01:22:47
long time. In terms of database
01:22:49
systems, I am personally very
01:22:52
excited about the huge shift left
01:22:54
that is happening in the industry.
01:22:56
Shift left the meaning all the
01:22:58
databases of service, you know,
01:23:02
from from Azure flexible server
01:23:03
Amazon RDS, Google Cloud SQL,
01:23:06
those are the big ones, but there
01:23:07
are a number of other companies
01:23:08
that are doing the same and
01:23:10
they're very interesting ideas
01:23:12
things that are really, you know,
01:23:15
shaping that whole area, so I
01:23:17
can mention a few for example,
01:23:18
Tembo, even EnterpriseDB and so
01:23:22
on that there's so much going on
01:23:24
in that space and in some sort of
01:23:26
the DBtune is really in that
01:23:28
specific direction, right? So
01:23:30
helping to automate more and more
01:23:32
of what you need to do in a
01:23:35
database when you're operating at
01:23:36
database. From a machine learning
01:23:38
perspective, and then I will I
01:23:39
will stop that Chris, I think
01:23:40
we're running out of time, from
01:23:42
machine learning perspective.
01:23:42
I'm really interested in and
01:23:45
that's something that we've been
01:23:46
studying for a few years now in my
01:23:48
academic team, with my PhD
01:23:50
students. The, you know, pushing
01:23:53
pushing the boundaries of what we
01:23:54
can do in terms of using machine
01:23:56
learning for computer systems and
01:23:58
specifically when you get computer
01:24:01
systems that have hundreds, if not
01:24:03
thousands of parameters and
01:24:06
variables to be optimized at the
01:24:07
same time jointly.
01:24:08
And we have recently
01:24:11
published a few pieces of work
01:24:13
that you can find on my
01:24:14
Google Scholar on that specific
01:24:15
topic. So it's a little math-y, you
01:24:17
know, it's a little hard to maybe
01:24:19
read them parts, but it's quite
01:24:20
rewarding to see that this new
01:24:22
pieces of technology are
01:24:24
becoming available to
01:24:26
practitioners and people that work
01:24:28
on applications as well.
01:24:29
So that perhaps the attention will
01:24:32
move away at some point from full
01:24:35
LLMs to also other areas in
01:24:38
machine learning and AI that are
01:24:39
also equally
01:24:40
interesting in my opinion.
01:24:42
Perfect. That's, that's beautiful.
01:24:45
Just send me the link. I'm happy
01:24:47
to put it into the show note. I
01:24:48
bet there's quite a few people
01:24:50
that would be really really into
01:24:52
reading those things. I'm not big
01:24:54
on mathematics that's probably way
01:24:55
over my head, but
01:24:56
that's, that's fine.
01:24:58
Yeah, I was that was a pleasure.
01:25:01
Thank you for being here.
01:25:04
And I hope we. Yeah, I hope we see
01:25:07
each other somewhere at a Postgres
01:25:09
conference we just briefly talked
01:25:10
about that before before the
01:25:12
recording started. So yeah, thank
01:25:14
you for being here.
01:25:16
And for the audience.
01:25:18
I see you.
01:25:20
I hear you next week or you hear
01:25:22
me next week with the next
01:25:23
episode. And thank you
01:25:25
for being here as well.
01:25:26
Awesome for the audience will be
01:25:28
at the Postgres Switzerland
01:25:30
conference sponsors and we will be
01:25:32
giving talks there so if you come
01:25:35
by, feel free to say hi, and we
01:25:38
can grab coffee together.
01:25:39
Thank you very much. Perfect. Yes.
01:25:41
Thank you. Bye bye.
01:25:44
The cloud commute podcast is sponsored by
01:25:46
simplyblock your own elastic
01:25:47
block storage engine for the cloud.
01:25:49
Get higher IOPS and low predictable
01:25:51
latency while bringing down your
01:25:52
total cost of ownership.
01:25:54
www.simplyblock.io

