Welcome to this week's episode of Simplyblock's Cloud Commute podcast! Host Chris Engelbert sits down with Shaun Thomas, affectionately known as "Mr. High Availability" in the Postgres community, to discuss his journey from a standard DBA to a leading expert in high availability solutions for Postgres databases.
In this episode of Cloud Commute, Chris and Shaun discuss:
- The evolution of high availability (HA) in Postgres and the challenges of implementation
- The rise of cloud-native Postgres and the role of Kubernetes in high availability
- Tembo’s open-source contributions to Postgres, including PGXN and extensions like pg_vectorize
- The future of Postgres with AI and vectorized searches using pgvector
Interested to learn more about the cloud infrastructure stack like storage, security, and Kubernetes? Head to our website (www.simplyblock.io/cloud-commute-podcast) for more episodes, and follow us on LinkedIn (www.linkedin.com/company/simplyblock-io). You can also check out the detailed show notes on Youtube (www.youtube.com/watch?v=UYlJfG_1hbs).
You can find Shaun Thomas on X @BonesMoses and Linkedin: /bonesmoses.
About simplyblock:
Simplyblock is an intelligent database storage orchestrator for IO-intensive workloads in Kubernetes, including databases and analytics solutions. It uses smart NVMe caching to speed up read I/O latency and queries. A single system connects local NVMe disks, GP3 volumes, and S3 making it easier to handle storage capacity and performance. With the benefits of thin provisioning, storage tiering, and volume pooling, your database workloads get better performance at lower cost without changes to existing AWS infrastructure.
👉 Get started with simplyblock: https://www.simplyblock.io/buy-now
🏪 simplyblock AWS Marketplace: https://aws.amazon.com/marketplace/seller-profile?id=seller-fzdtuccq3edzm
01:00:00
But by default, it only kept track
01:00:01
of the first 200.
01:00:04
So if you had more
01:00:05
than that, even if you
01:00:06
were vacuuming
01:00:06
constantly, it would still
01:00:08
bloat like a little bit every day
01:00:10
until your whole disk was used.
01:00:12
So I actually had
01:00:13
to clean all that up
01:00:14
or their system
01:00:15
was going to crash.
01:00:16
They were days away from going
01:00:17
down when I joined.
01:00:23
You're listening to simplyblock's Cloud Commute Podcast,
01:00:25
your weekly 20 minute
01:00:27
podcast about cloud technologies,
01:00:28
Kubernetes, security,
01:00:30
sustainability, and more.
01:00:32
Hello, welcome back to this week's
01:00:34
episode of simplyblock's
01:00:35
Cloud Commute podcast.
01:00:37
This week I have--
01:00:38
no, I'm not saying that.
01:00:39
I'm not saying I have another
01:00:40
incredible guest,
01:00:41
even though I have.
01:00:43
He's already shaking his head.
01:00:46
Nah, I'm not incredible
01:00:48
He's just known as Mr. High
01:00:50
Availability in the Postgres space
01:00:53
for a very specific reason.
01:00:55
I bet he'll talk
01:00:56
about that in a second.
01:00:58
So hello, Shaun.
01:01:00
Shaun Thomas, thank
01:01:02
you for being here.
01:01:03
And maybe just
01:01:04
introduce yourself real quick.
01:01:06
Who are you?
01:01:07
Well, where are you from?
01:01:08
How did you become
01:01:09
Mr. High Availability?
01:01:12
Yeah, so glad to be here.
01:01:15
Kind of hang out with you.
01:01:16
We talked a little bit.
01:01:17
It's kind of fun.
01:01:19
My background is I
01:01:21
was just a standard DBA,
01:01:25
kind of working on programming
01:01:27
stuff at a company I was at
01:01:28
and our DBA quit, so I
01:01:31
kind of had to pick it up
01:01:33
to make sure we kept going.
01:01:35
And that was back
01:01:36
in the Oracle days.
01:01:37
So I just kind of read
01:01:41
a bunch of Oracle books
01:01:43
to kind of get ready for it.
01:01:45
And then they had some layoffs, so
01:01:47
our whole division got cut.
01:01:49
And then my next job was as a DBA.
01:01:52
And I just kind of
01:01:54
latched onto it from there.
01:01:57
And as far as how I got
01:01:58
into high availability
01:02:00
and where I kind of made that my
01:02:03
calling card was around 2010,
01:02:06
I started working for a company
01:02:07
that was a financial.
01:02:10
And they had to keep their systems
01:02:12
online at all times
01:02:13
because every
01:02:13
second they were down,
01:02:14
they were losing
01:02:15
millions of dollars.
01:02:17
So they actually already had a
01:02:20
high availability stack,
01:02:21
but it was using a bunch of
01:02:22
proprietary tools.
01:02:24
So when I started
01:02:25
working there, I basically
01:02:26
reworked everything.
01:02:28
And we ended up using standard
01:02:31
stack at the time
01:02:32
was pacemaker with
01:02:34
Corosync and DRBD
01:02:37
for distributed
01:02:38
replicating block device
01:02:40
because we didn't really trust
01:02:41
replication back then
01:02:42
because it was still too new.
01:02:45
And we were actually running
01:02:48
Enterprise DB at the time also.
01:02:50
So there's a
01:02:51
bunch of beta features
01:02:52
they had kind of pushed into 9.2
01:02:54
at the time, I think.
01:02:57
And because of that whole process
01:03:00
and not really having
01:03:01
any kind of guide to follow
01:03:02
because there was not really a lot
01:03:05
of high availability tools
01:03:05
back in 2010, 2011.
01:03:08
So I basically wrote up our stack
01:03:10
and the process I used.
01:03:12
And I presented it to
01:03:14
the second Postgres Open
01:03:15
that was in Chicago.
01:03:20
I did a live demo
01:03:21
of the entire stack.
01:03:22
And that video is
01:03:22
probably online somewhere.
01:03:25
But my slides, I think, are also
01:03:26
on the Postgres Wiki.
01:03:28
But after that, I was
01:03:30
approached by packt,
01:03:32
packtpub the publisher.
01:03:34
And they wanted me
01:03:34
to write a book on it.
01:03:36
So I did.
01:03:37
And I did it
01:03:39
mainly because I'm like,
01:03:40
I didn't have a book to follow.
01:03:42
Somebody else
01:03:42
that's in this position
01:03:43
really needs to have some kind of
01:03:45
series or a book
01:03:47
or some kind of step-by-step thing
01:03:50
because high availability in
01:03:52
Postgres is really important.
01:03:53
You don't want your
01:03:53
database to go down
01:03:54
in a lot of situations.
01:03:58
And until there's a
01:03:58
lot more tools out there
01:03:59
to cover your bases, being able to
01:04:01
do it is important.
01:04:03
Now there's tons of tools for it.
01:04:05
So it's not a big problem.
01:04:06
But back then, man, oof.
01:04:09
Yeah, yeah.
01:04:10
I mean, you just
01:04:11
mentioned Pacemaker.
01:04:12
I'm not sure when I heard that
01:04:14
thing the last time.
01:04:16
Is that even still a thing?
01:04:17
There's still a
01:04:18
couple companies using it.
01:04:19
Yeah, you wouldn't--
01:04:20
You would be surprised.
01:04:21
I think DFW does in
01:04:25
a couple of spots.
01:04:26
All right.
01:04:26
So--
01:04:29
All right.
01:04:29
I haven't heard about that in at
01:04:31
least a decade, I think.
01:04:33
Everything I've
01:04:33
worked with had different--
01:04:35
or let's say other tools, not
01:04:37
different tools.
01:04:39
Wow.
01:04:41
Yeah, cool.
01:04:42
So you wrote that book.
01:04:44
And you said you came from an
01:04:46
oracle world, right?
01:04:47
So how did the transition to
01:04:50
Postgres happen?
01:04:51
Was that a choice?
01:04:55
For me, it wasn't
01:04:56
really much of a transition
01:04:57
because, like I said, our DBA quit
01:05:00
at the company I was at.
01:05:02
And it was right
01:05:03
before a bunch of layoffs
01:05:04
that took out
01:05:05
that entire division.
01:05:08
But at the time, I
01:05:10
was like, ooh, oracle.
01:05:11
I should learn all this stuff.
01:05:13
So the company just had a bunch of
01:05:14
old training materials
01:05:16
lying around.
01:05:17
And there was like three or four
01:05:18
of the huge Oracle books
01:05:20
lying around.
01:05:21
So I spent the next three or four
01:05:23
weeks just reading all
01:05:25
of them back to back.
01:05:26
And I was testing in a cluster
01:05:28
that we had available.
01:05:30
And I set the local
01:05:31
version up in my computer
01:05:32
just to see if it worked and that
01:05:34
and all the stuff
01:05:35
I was trying to learn at the time.
01:05:36
But then the layoffs hit.
01:05:37
So I was like, what do I do now?
01:05:40
And I got another job at a company
01:05:41
that needed a DBA.
01:05:42
And that was MySQL and Postgres.
01:05:45
But that was back when
01:05:46
Postgres was still 6.5.
01:05:49
Back when it crashed if
01:05:51
you looked at it funny.
01:05:54
So I got kind of mad at it.
01:05:55
And I basically stopped using it
01:05:56
from like 2005 to 2010.
01:06:01
Or no, that was--
01:06:02
sorry, from 2001 to 2005.
01:06:06
From 2005, I switched to a company
01:06:08
that they were all Postgres.
01:06:11
So I got the purple Postgres book.
01:06:13
The one that everyone used back
01:06:15
then was I think it was 8.1
01:06:16
or 8.2.
01:06:18
And then I revised
01:06:19
their entire stack also
01:06:22
because they were
01:06:23
having problems with vacuum.
01:06:24
Because back then, the
01:06:26
settings were all wrong.
01:06:28
So you would end up loading
01:06:29
yourself out of your disk space.
01:06:32
I ended up vacuuming their systems
01:06:34
down from I think
01:06:36
it was 20 gigs down to like 5.
01:06:40
And back then, that
01:06:41
was a lot of disk space.
01:06:43
I was just about
01:06:45
to say that in 2005,
01:06:47
20 gigabytes of
01:06:49
disk space was a lot.
01:06:53
But back then, the
01:06:55
problem with vacuum
01:06:55
was you actually had to set the
01:06:57
size of the free space map.
01:07:00
And the default was way too small.
01:07:02
So what would happen is vacuum
01:07:03
would actually only
01:07:04
keep track of the last 200
01:07:07
unused reusable rows
01:07:10
by default.
01:07:11
But by default, it only kept track
01:07:13
of the first 200.
01:07:15
So if you had more
01:07:16
than that, even if you
01:07:17
were vacuuming
01:07:18
constantly, it would still
01:07:20
bloat like a little bit every day
01:07:21
until your whole disk was used.
01:07:24
So I actually had
01:07:25
to clean all that up
01:07:26
or their system
01:07:27
was going to crash.
01:07:28
They were days away from going
01:07:29
down when I joined.
01:07:32
They had already added all the
01:07:34
disks they could.
01:07:35
And back then, you couldn't just
01:07:36
add virtual disk space.
01:07:38
I know those situations, not in
01:07:42
the Postgres or database
01:07:43
space, but in the software
01:07:45
development space where--
01:07:47
same thing, I literally joined
01:07:49
days before it all
01:07:50
would fell apart.
01:07:55
Let's say those are not
01:07:56
the best days to join.
01:07:59
Hey, that's why
01:08:00
they hired you, right?
01:08:02
Exactly.
01:08:03
All right.
01:08:04
So let's talk a little
01:08:07
bit about these days.
01:08:09
Right now, you're with Tembo.
01:08:12
And you just have
01:08:14
this very nice blog post
01:08:17
that blew up on Hacker News for
01:08:19
all the wrong reasons.
01:08:22
Well, I mean, we created it for
01:08:24
all the right reasons.
01:08:26
And so let me just
01:08:28
start on Tembo a little bit.
01:08:29
So Tembo is like they
01:08:33
are all in on Postgres.
01:08:35
We are ridiculously all in.
01:08:38
Basically, everything we
01:08:39
do is all open sourced.
01:08:40
You can go to Tembo.io on GitHub.
01:08:42
And basically, our
01:08:43
entire stack is there.
01:08:46
And we even just
01:08:47
released our on-prem.
01:08:50
So you can actually use our stack
01:08:51
on your local system
01:08:52
and basically have a Kubernetes
01:08:55
cloud management
01:08:56
thing for all the
01:08:57
clusters you want to manage.
01:08:59
And it'll just be
01:09:00
our stack of tools.
01:09:02
And the main calling card of Tembo
01:09:04
is probably our--
01:09:05
if you go to trunk,
01:09:07
I think it's called PGT.dev.
01:09:09
We just keep track of
01:09:10
a bunch of extensions.
01:09:11
And it's got a command line tool
01:09:13
to install them,
01:09:14
kind of like a PGXN.
01:09:17
And we're so kind of into this
01:09:19
that we actually
01:09:20
hired the guy who basically
01:09:22
maintained PGXN, David Wheeler.
01:09:26
Because we were like, we need to
01:09:28
kind of hit the extension
01:09:31
drum.
01:09:32
And we're very glad he's like
01:09:35
re-standardizing PGXN too.
01:09:38
He's starting a whole initiative.
01:09:40
And he's got a lot
01:09:42
of buy-in from tons
01:09:44
of different
01:09:44
committers and devs and people
01:09:47
who are really pushing it.
01:09:49
And we want to get--
01:09:51
maybe we'll create the gold
01:09:52
standard of extension networks.
01:09:55
Because the idea is to get it all
01:09:58
so that it's packaged, right?
01:10:00
Kind of like a Debian or an RPM or
01:10:02
whatever package system
01:10:03
you want to use.
01:10:05
It'll just install the package on
01:10:07
your Postgres wherever it is.
01:10:08
Like the source install, if it's
01:10:09
like a package install,
01:10:11
or if it's something
01:10:12
with on your Mac, whatever.
01:10:16
So he's working on that really.
01:10:17
And he's done some demos that are
01:10:18
very impressive.
01:10:19
And it looks like it'll actually
01:10:20
be a great advancement.
01:10:23
But Tembo is-- it's all about open
01:10:28
source Postgres.
01:10:29
And our tools kind of show that.
01:10:32
Like if you've ever heard of Adam
01:10:34
Hendel, he goes by Chuck.
01:10:36
But if you heard of
01:10:37
PGMQ or PG Vectorize,
01:10:41
which kind of makes PG Vector a
01:10:43
little easier to use,
01:10:45
those tools are all
01:10:46
coming from us, basically.
01:10:49
So we're putting our money where
01:10:51
our mouth is, right?
01:10:57
All right.
01:10:59
That's why I joined him.
01:11:00
Because I kept seeing
01:11:00
them pop up on Twitter.
01:11:01
And I'm like, man,
01:11:02
these guys really--
01:11:03
they're really
01:11:03
dedicated to this whole thing.
01:11:10
Yeah, cool.
01:11:11
So back to PG and
01:11:15
high availability.
01:11:17
Why would I need that?
01:11:19
I mean, I know.
01:11:20
But maybe just give the audience a
01:11:22
little bit of a clue.
01:11:26
So high availability--
01:11:29
and I kind of implied this
01:11:30
when I was talking about the
01:11:32
financial company, right?
01:11:33
The whole idea is to make sure
01:11:34
Postgres never goes down.
01:11:36
But there's so much more to it.
01:11:38
I've done conferences.
01:11:41
And I've done webinars.
01:11:42
And I've done trainings.
01:11:44
And I've done the book.
01:11:47
Just covering that
01:11:48
topic is it's essentially
01:11:50
an infinite font of just all the
01:11:53
different ways you can do it,
01:11:54
all the different prerequisites
01:11:55
you need to fulfill,
01:11:56
all the different
01:11:57
things you need to set up
01:12:00
to make it work properly.
01:12:01
But the whole point is
01:12:02
keep your Postgres up.
01:12:04
But you also have to
01:12:05
define what that means.
01:12:07
Where do you put
01:12:07
your Postgres instances?
01:12:09
Where do you put your replicas?
01:12:11
How do you get to them?
01:12:13
Do you need an intermediate
01:12:15
abstraction layer
01:12:17
so that you can connect to that?
01:12:18
And it'll kind of decide where to
01:12:20
send you afterwards
01:12:21
so you don't have any outages as
01:12:23
far as routing is concerned?
01:12:26
It's a very deep topic.
01:12:29
And it's easy to get wrong.
01:12:32
And a lot of the tools out there,
01:12:35
they don't
01:12:35
necessarily get it wrong.
01:12:36
But they expect the
01:12:37
user to get it right.
01:12:40
One of the reasons my book did so
01:12:42
well in certain circles
01:12:43
is because if you want to set up
01:12:45
EFM or repmgr or Patroni
01:12:48
or some other tool, you have to
01:12:51
follow very closely
01:12:53
and know how the tool
01:12:53
works extremely well.
01:12:55
You have to be very familiar with
01:12:57
the documentation.
01:12:58
You can't just follow step by step
01:13:00
and then expect it to
01:13:01
work in a lot of cases.
01:13:02
Now, there's a lot of edge cases
01:13:03
you have to account for.
01:13:04
You have to know
01:13:05
why and the theories
01:13:06
behind the high availability and
01:13:08
how it works a certain way
01:13:11
to really deploy it properly.
01:13:13
So even as a consultant
01:13:16
when I was working at EDB
01:13:18
and a second quadrant, it's easy
01:13:22
to give a stack to a customer
01:13:25
and they can implement it with
01:13:27
your recommendations.
01:13:28
And you can even
01:13:28
set it up for them.
01:13:29
There's always some kind of edge
01:13:30
case that you didn't think of.
01:13:33
So the issue with Postgres, in
01:13:37
kind of my opinion,
01:13:38
is it gives you a lot of tools to
01:13:40
build it yourself,
01:13:42
but it expects you
01:13:42
to build it yourself.
01:13:44
And even the other stack tools,
01:13:45
like I had mentioned earlier,
01:13:46
like repmgr
01:13:46
or EFM or Patroni,
01:13:50
those are pg_auto_failover,
01:13:52
another one that came out
01:13:53
recently.
01:13:55
They work, but you've
01:13:57
got to install them.
01:13:58
And you really do need access to
01:14:00
an expert that can come in
01:14:01
if something goes wrong.
01:14:03
Because if something goes wrong,
01:14:04
you're kind of on your own
01:14:05
in a lot of ways.
01:14:06
Postgres doesn't really have an
01:14:07
inherent integral way
01:14:09
of managing itself as a cluster.
01:14:12
It's more of like a
01:14:13
database that just happens
01:14:14
to be able to talk to other nodes
01:14:15
to keep them up to date
01:14:17
with sync and whatnot.
01:14:20
So it's important, but it's also
01:14:23
hard to do right.
01:14:25
I think you mentioned one
01:14:27
important thing.
01:14:27
It is important to
01:14:29
upfront define your goals.
01:14:34
How much uptime
01:14:36
do you really need?
01:14:38
Because one thing that not only
01:14:40
with Postgres, but in general,
01:14:42
whenever we talk about failure
01:14:45
tolerance systems,
01:14:47
high availability, all
01:14:48
those kinds of things,
01:14:49
what a lot of
01:14:50
people seem to forget
01:14:52
is that high
01:14:53
availability or fault tolerance
01:14:56
is a trade-off between how much
01:14:58
time and money do I invest
01:15:01
and how much money do I lose if
01:15:03
something really, well,
01:15:05
you could say,
01:15:06
s***t hits the fan, right?
01:15:08
Exactly.
01:15:09
And that's the thing.
01:15:11
Companies like the financial
01:15:12
company I worked at,
01:15:13
they took high
01:15:14
availability to a fault.
01:15:17
They had two systems in
01:15:20
their main data center.
01:15:21
They had two systems in their
01:15:22
disaster recovery data center
01:15:24
that were fully
01:15:24
synced and up to date.
01:15:26
They had backups that were on both
01:15:29
local systems taken
01:15:30
every day that was also
01:15:31
shipped to a system that
01:15:33
had seven days worth locally.
01:15:35
And that was sent to tape, which
01:15:37
was then sent to Glacier,
01:15:40
which according to SEC rules, they
01:15:41
had to keep for seven years.
01:15:42
So someone could
01:15:44
come into our systems
01:15:46
and maliciously erase
01:15:47
literally everything,
01:15:48
and we'd be back up in an hour.
01:15:51
It was very resilient.
01:15:53
And part of that was our design
01:15:55
and the amount of money
01:15:57
we dedicated
01:15:57
toward it because that
01:15:58
was a very expensive
01:15:59
deployment because that's
01:16:01
at least 10 servers right there.
01:16:05
But also, when you say you could
01:16:07
be back up in an hour,
01:16:10
the question is, how much money do
01:16:11
you lose in that hour still?
01:16:14
Well, like I said, that was like
01:16:15
someone actually walks in
01:16:17
and literally
01:16:17
smashes all the servers.
01:16:19
We have to go from a backup and
01:16:22
actually rebuild everything
01:16:22
from scratch.
01:16:24
In most cases, we'd be up--
01:16:26
and this is where
01:16:26
your RTO and RPO come in,
01:16:29
the recovery time objective and
01:16:30
your recovery point objective.
01:16:32
Basically, how much do
01:16:33
you want to spend to say,
01:16:35
I want to be down
01:16:36
for one minute or less?
01:16:37
Or if I am down
01:16:39
for that one minute,
01:16:40
how much data will I lose?
01:16:42
Because the amount
01:16:43
of money you spend
01:16:43
or the amount of resources you
01:16:44
dedicate toward that thing
01:16:46
will determine the end result of
01:16:48
how much data you might lose
01:16:49
or how much money you'll need to
01:16:52
spend to make sure you're
01:16:52
down for less than a minute.
01:16:55
That kind of thing.
01:16:56
I think that becomes more
01:16:57
important in the cloud age.
01:17:00
So perfect bridge to cloud,
01:17:02
Postgres and cloud, perfect.
01:17:04
So you said setting
01:17:05
up HA is complicated
01:17:07
because you have to
01:17:08
install the tools.
01:17:09
You have to configure them.
01:17:11
These days, when you
01:17:12
go and deploy Postgres
01:17:15
on something like Kubernetes, you
01:17:17
would have an operator
01:17:18
claiming at least doing
01:17:19
all the magic for you.
01:17:21
What is your opinion on the magic?
01:17:23
Yeah, so my opinion on
01:17:26
that is it evolved a lot.
01:17:27
Back when I first started seeing
01:17:29
containerized systems
01:17:30
like Docker and that kind of
01:17:33
thing, my opinion was,
01:17:36
I don't know if I'd run a
01:17:37
production system
01:17:38
in a container, right?
01:17:39
Because it just
01:17:39
seems a little shady.
01:17:41
But that was 10 years ago or more.
01:17:46
Now that Kubernetes
01:17:47
tools and that kind of thing
01:17:48
have matured a lot, what
01:17:50
you get out of this now
01:17:51
is you get a level of automation
01:17:53
that just is not
01:17:54
possible using
01:17:55
pretty much anything else.
01:17:59
And I think what
01:18:00
really sold it to me was--
01:18:02
so you may have heard
01:18:04
of Gabriele Bartolini.
01:18:06
He's basically heads up the team
01:18:09
that writes and maintains
01:18:13
Cloud Native Postgres, the Cloud
01:18:15
Native PG operator.
01:18:17
So we'll talk about operators
01:18:18
probably a bit later.
01:18:19
But the point of
01:18:21
that was back when--
01:18:23
and a 2ndQuadrant was before
01:18:25
they were bought by EDB,
01:18:27
we were selling our BDR tool for
01:18:31
bi-directional application
01:18:33
for Postgres, right?
01:18:34
So multi-master.
01:18:36
And we needed a way to
01:18:39
put that in a Cloud service
01:18:41
for obvious purposes so we could
01:18:43
sell it to customers.
01:18:45
And that meant we
01:18:45
needed an operator.
01:18:47
Well, before Cloud
01:18:50
Native Postgres existed,
01:18:51
there was the BDR
01:18:53
operator that we were circling,
01:18:55
cycling internally for customers.
01:18:58
And one day while
01:19:01
we were in Italy--
01:19:02
because every employee who worked
01:19:04
at 2ndQuadrant
01:19:05
got sent to Italy
01:19:06
for a couple of weeks
01:19:07
to get oriented with the team,
01:19:09
that kind of thing.
01:19:10
And during that time
01:19:11
when I was there in 2020,
01:19:14
I think I was there for February,
01:19:15
for the first two
01:19:16
weeks of February.
01:19:17
He demoed that.
01:19:20
And it kind of blew me away.
01:19:23
We were using other tools to
01:19:25
deploy containers.
01:19:27
And it basically Ansible to
01:19:30
automate the deployment
01:19:32
with Terraform.
01:19:33
And then you kind of set
01:19:34
everything up and then
01:19:35
deploy everything.
01:19:36
And it takes minutes to
01:19:39
set up all the packages
01:19:40
and get everything deployed and
01:19:42
reconfigure everything.
01:19:43
And then you have to wait for
01:19:44
syncs and whatnot
01:19:46
make sure everything's proper.
01:19:48
On someone's laptop, they set up
01:19:51
Kubernetes Docker deployment.
01:19:53
Kind, I think we were
01:19:54
using at that point,
01:19:56
Kubernetes in Docker.
01:19:58
And in less than a
01:20:01
minute, he had on his laptop
01:20:04
set up a full Kubernetes cluster
01:20:06
of three replicating,
01:20:09
bidirectional replicating, so
01:20:10
three multi-master nodes
01:20:12
of Postgres on his
01:20:13
laptop in less than a minute.
01:20:15
And I was just
01:20:16
like, my mind was blown.
01:20:19
And the thing is,
01:20:21
basically, it's a new concept.
01:20:24
The data is what matters.
01:20:26
The nodes themselves are
01:20:28
completely unimportant.
01:20:30
And that's why to kind of bring
01:20:32
this back around,
01:20:34
when Cloud Native Postgres was
01:20:35
released by Enterprise DB
01:20:37
kind of as an open
01:20:39
source tool for Postgres
01:20:41
and not the
01:20:41
bidirectional replication
01:20:42
stuff for just Postgres.
01:20:45
The reason that was important was
01:20:46
because it's an ethos.
01:20:49
The point is your compute nodes
01:20:52
throw them away.
01:20:53
They don't matter.
01:20:54
If one goes down, you
01:20:55
provision a new one.
01:20:57
If you need to upgrade your
01:20:59
tooling or the packages,
01:21:02
you throw away the
01:21:03
old container image,
01:21:05
you bring up a new one.
01:21:06
The important part is your data.
01:21:08
And as long as your data is on
01:21:10
your persistent volume claim
01:21:12
or whatever you provision that as,
01:21:15
the container itself, the version
01:21:17
of Postgres you're
01:21:18
running, those aren't
01:21:19
nearly as important.
01:21:21
So it complicates
01:21:24
debugging to a certain extent.
01:21:26
And we can kind of talk
01:21:27
about that maybe later.
01:21:28
But the important part is it
01:21:31
brings high availability
01:21:32
to a level that can't really be
01:21:34
described using the old methods.
01:21:36
Because the old method was you
01:21:39
create two or three replicas.
01:21:42
And if one goes down, you've got a
01:21:43
monitoring system
01:21:44
that switches over to
01:21:44
one of the alternates.
01:21:46
And then the other one might come
01:21:48
back or might not.
01:21:49
And then you rebuild it if it
01:21:50
does, that kind of thing.
01:21:52
With the Kubernetes approach or
01:21:54
the container approach,
01:21:55
as long as your
01:21:56
storage wasn't corrupted,
01:21:59
you can just bring
01:22:00
up a new container
01:22:00
to represent that storage.
01:22:02
And you can
01:22:03
actually have a situation
01:22:04
where the primary goes down
01:22:05
because maybe it
01:22:08
got OOM killed for some reason.
01:22:11
It can actually go down, get a new
01:22:13
container provisioned,
01:22:14
and come back up
01:22:15
before the monitors even
01:22:17
notice that there was an outage
01:22:19
and the switch to a replica
01:22:20
and promote it.
01:22:22
There's a whole
01:22:24
mechanism of systems
01:22:25
in there to kind of reduce the
01:22:27
amount of timeline switches
01:22:28
and other kind of complications
01:22:30
behind the scenes.
01:22:31
So you have a
01:22:32
cohesive, stable timeline.
01:22:37
You maximize your uptime.
01:22:39
They've got layers to
01:22:40
redirect connections
01:22:42
from the outside world
01:22:43
through either traffic
01:22:45
or some other kind of proxy to get
01:22:51
into your actual cluster.
01:22:53
You always get an
01:22:54
endpoint somehow.
01:22:56
And that's something
01:22:57
that was horribly wrong,
01:22:58
but that's true for anything.
01:23:00
But the ethos of your machines
01:23:03
aren't important.
01:23:05
It spoke to me a little bit
01:23:06
because it brings you
01:23:07
to a level that sure, their
01:23:09
hardware is great.
01:23:11
And I actually prefer it.
01:23:12
I've got servers in my
01:23:13
basement specifically
01:23:14
for testing clusters
01:23:16
and Postgres and whatnot.
01:23:18
But if you have the
01:23:20
luxury of provisioning
01:23:22
what you need at the time, if I
01:23:26
want more compute nodes,
01:23:29
like I said, show my image, bring
01:23:30
up a new one that's
01:23:31
got more
01:23:32
resources allocated to it,
01:23:33
suddenly I've grown vertically.
01:23:36
And that's something you can't
01:23:38
really do with bare hardware,
01:23:39
at least not very easily.
01:23:42
So then I was like, well, maybe
01:23:43
this whole container thing
01:23:43
isn't really a problem, right?
01:23:46
So yeah, it's all because of my
01:23:49
time in 2ndQuadrant
01:23:50
and Gabriele's team that high
01:23:54
availability does
01:23:55
belong in the cloud.
01:23:56
And you can run production in the
01:23:58
cloud on Kubernetes
01:23:59
and containers.
01:24:01
And in fact, I encourage it.
01:24:03
I love that.
01:24:03
I love that.
01:24:04
I also think high
01:24:05
availability in cloud,
01:24:07
and especially cloud
01:24:08
native are concepts
01:24:09
that are perfectly in line and
01:24:11
perfectly in sync.
01:24:13
Unfortunately, we're out of time.
01:24:16
I didn't want to
01:24:16
stop you, but I think
01:24:18
you have to invite you again and
01:24:21
keep talking about that.
01:24:23
But one last question.
01:24:25
One last question.
01:24:25
By the way, I love when
01:24:28
you said that containers
01:24:29
were a new thing
01:24:30
like 10 years ago,
01:24:31
except for you came from the
01:24:32
Solaris or BSD world where
01:24:34
those things were--
01:24:35
Jails!
01:24:36
But it's still different, right?
01:24:39
You didn't have this
01:24:40
orchestration layer on top.
01:24:41
The whole ecosystem evolved very
01:24:43
differently in the Linux space.
01:24:45
Anyway, last question.
01:24:50
What do you think is
01:24:50
the next big thing?
01:24:51
What is upcoming
01:24:53
in the Postgres,
01:24:54
but Linux, the
01:24:55
container world, what do you
01:24:56
think is amazing on the horizon?
01:25:01
I mean, I hate to be cliche here,
01:25:04
but it's got to be AI.
01:25:07
If you look at pgvector
01:25:09
it's basically
01:25:12
allowing you to do vectorized
01:25:13
similar researches right
01:25:15
in Postgres.
01:25:17
And I think Timescale even
01:25:18
released pgvectorscale,
01:25:21
which is an extension that makes
01:25:22
pgvector even better.
01:25:24
It makes it apparently faster than
01:25:25
dedicated vector
01:25:26
databases like Pinecone.
01:25:29
And it's just an
01:25:31
area that if you're
01:25:32
going to do any kind of result,
01:25:35
augmented generation,
01:25:36
like RAG searches, or if you're
01:25:39
doing any LLM work at all,
01:25:42
if you're building
01:25:42
chatbots, or if you're just
01:25:44
doing, like I said,
01:25:46
augmented searches,
01:25:47
any of that kind of
01:25:47
work, you're going
01:25:49
to be wanting your data that's in
01:25:51
Postgres already, right?
01:25:52
You're going to want to make that
01:25:54
available to your AI.
01:25:57
And the easiest way to
01:25:58
do that is with pgvector.
01:26:00
Tembo even wrote an extension we
01:26:02
call pg_vectorize,
01:26:03
which automatically maintains your
01:26:05
embeddings, which
01:26:06
is how you kind of interface your
01:26:08
searches with the text.
01:26:09
And then you can feed
01:26:10
that back into an LLM.
01:26:11
It also has the
01:26:13
ability to do that for you.
01:26:14
Like it can send
01:26:15
messages directly to OpenAI.
01:26:17
We can also interface
01:26:18
with arbitrary paths
01:26:20
so you can set up an Ollama or
01:26:23
something on a server
01:26:25
or locally.
01:26:26
And then you can set
01:26:27
that to be the end target.
01:26:28
So you can even keep your messages
01:26:30
from hitting external resources
01:26:33
like Microsoft or OpenAI
01:26:35
or whatever, just
01:26:36
do it all locally.
01:26:39
And that's all very important.
01:26:41
So that I think is going to be--
01:26:44
it's whatever
01:26:44
one-- not either one,
01:26:45
but a lot of people
01:26:46
are focusing on it.
01:26:47
And a lot of
01:26:47
people find it annoying.
01:26:48
It's another AI thing, right?
01:26:50
But I wrote two blog posts on this
01:26:52
where I wrote a RAG app using some
01:26:56
Python and pgvector.
01:26:58
And then I wrote a second one
01:26:59
where I used pg_vectorize
01:27:02
and I cut my
01:27:03
Python code by like 90%.
01:27:05
And it just
01:27:06
basically talks to Postgres.
01:27:07
Postgres is doing it all.
01:27:09
And that's because of the
01:27:10
extension ecosystem, right?
01:27:12
And that's one of
01:27:12
the reasons Postgres
01:27:13
is kind of on the top of
01:27:15
everyone's mind right now
01:27:15
because it's leading the charge.
01:27:18
And it's bringing a lot
01:27:19
of people in that may not
01:27:21
have been interested before.
01:27:23
I love that.
01:27:24
And I think that's a perfect
01:27:25
sentence to end the show.
01:27:30
The Postgres ecosystem or
01:27:32
extension system
01:27:33
is just incredible.
01:27:34
And there's so much stuff that
01:27:36
we've seen so far
01:27:37
and so much more stuff to come.
01:27:40
I couldn't agree more.
01:27:42
Yeah, it's just
01:27:42
the beginning, man.
01:27:44
Yeah, let's hope
01:27:44
that AI is not going
01:27:46
to try to build our HA systems.
01:27:49
And I'm happy.
01:27:52
Maybe not yet, yeah.
01:27:54
Yeah, not yet at least.
01:27:55
Exactly.
01:27:56
All right, thank
01:27:57
you for being here.
01:27:58
It was a pleasure.
01:28:00
As I said, I think I have to
01:28:01
invite you again somewhere
01:28:02
in the future.
01:28:03
More than willing.
01:28:07
And to the audience, thank you for
01:28:09
listening in again.
01:28:11
I hope you come back next week.
01:28:13
And thank you very much.
01:28:16
Take care.
01:28:18
The cloud commute podcast is sponsored by
01:28:20
simplyblock your own elastic
01:28:21
block storage engine for the cloud.
01:28:23
Get higher IOPS and low predictable
01:28:25
latency while bringing down your
01:28:26
total cost of ownership.
01:28:28
www.simplyblock.io

