How I designed PostgreSQL High Availability with Shaun Thomas from Tembo

Welcome to this week's episode of Simplyblock's Cloud Commute podcast! Host Chris Engelbert sits down with Shaun Thomas, affectionately known as "Mr. High Availability" in the Postgres community, to discuss his journey from a standard DBA to a leading expert in high availability solutions for Postgres databases.

In this episode of Cloud Commute, Chris and Shaun discuss:

The evolution of high availability (HA) in Postgres and the challenges of implementation
The rise of cloud-native Postgres and the role of Kubernetes in high availability
Tembo’s open-source contributions to Postgres, including PGXN and extensions like pg_vectorize
The future of Postgres with AI and vectorized searches using pgvector

Interested to learn more about the cloud infrastructure stack like storage, security, and Kubernetes? Head to our website (www.simplyblock.io/cloud-commute-podcast) for more episodes, and follow us on LinkedIn (www.linkedin.com/company/simplyblock-io). You can also check out the detailed show notes on Youtube (www.youtube.com/watch?v=UYlJfG_1hbs).

You can find Shaun Thomas on X @BonesMoses and Linkedin: /bonesmoses.

About simplyblock:

Simplyblock is an intelligent database storage orchestrator for IO-intensive workloads in Kubernetes, including databases and analytics solutions. It uses smart NVMe caching to speed up read I/O latency and queries. A single system connects local NVMe disks, GP3 volumes, and S3 making it easier to handle storage capacity and performance. With the benefits of thin provisioning, storage tiering, and volume pooling, your database workloads get better performance at lower cost without changes to existing AWS infrastructure.

👉 Get started with simplyblock: https://www.simplyblock.io/buy-now

🏪 simplyblock AWS Marketplace: https://aws.amazon.com/marketplace/seller-profile?id=seller-fzdtuccq3edzm

01:00:00
But by default, it only kept track

01:00:01
of the first 200.

01:00:04
So if you had more

01:00:05
than that, even if you

01:00:06
were vacuuming

01:00:06
constantly, it would still

01:00:08
bloat like a little bit every day

01:00:10
until your whole disk was used.

01:00:12
So I actually had

01:00:13
to clean all that up

01:00:14
or their system

01:00:15
was going to crash.

01:00:16
They were days away from going

01:00:17
down when I joined.

01:00:23
You're listening to simplyblock's Cloud Commute Podcast,

01:00:25
your weekly 20 minute

01:00:27
podcast about cloud technologies,

01:00:28
Kubernetes, security,

01:00:30
sustainability, and more.

01:00:32
Hello, welcome back to this week's

01:00:34
episode of simplyblock's

01:00:35
Cloud Commute podcast.

01:00:37
This week I have--

01:00:38
no, I'm not saying that.

01:00:39
I'm not saying I have another

01:00:40
incredible guest,

01:00:41
even though I have.

01:00:43
He's already shaking his head.

01:00:46
Nah, I'm not incredible

01:00:48
He's just known as Mr. High

01:00:50
Availability in the Postgres space

01:00:53
for a very specific reason.

01:00:55
I bet he'll talk

01:00:56
about that in a second.

01:00:58
So hello, Shaun.

01:01:00
Shaun Thomas, thank

01:01:02
you for being here.

01:01:03
And maybe just

01:01:04
introduce yourself real quick.

01:01:06
Who are you?

01:01:07
Well, where are you from?

01:01:08
How did you become

01:01:09
Mr. High Availability?

01:01:12
Yeah, so glad to be here.

01:01:15
Kind of hang out with you.

01:01:16
We talked a little bit.

01:01:17
It's kind of fun.

01:01:19
My background is I

01:01:21
was just a standard DBA,

01:01:25
kind of working on programming

01:01:27
stuff at a company I was at

01:01:28
and our DBA quit, so I

01:01:31
kind of had to pick it up

01:01:33
to make sure we kept going.

01:01:35
And that was back

01:01:36
in the Oracle days.

01:01:37
So I just kind of read

01:01:41
a bunch of Oracle books

01:01:43
to kind of get ready for it.

01:01:45
And then they had some layoffs, so

01:01:47
our whole division got cut.

01:01:49
And then my next job was as a DBA.

01:01:52
And I just kind of

01:01:54
latched onto it from there.

01:01:57
And as far as how I got

01:01:58
into high availability

01:02:00
and where I kind of made that my

01:02:03
calling card was around 2010,

01:02:06
I started working for a company

01:02:07
that was a financial.

01:02:10
And they had to keep their systems

01:02:12
online at all times

01:02:13
because every

01:02:13
second they were down,

01:02:14
they were losing

01:02:15
millions of dollars.

01:02:17
So they actually already had a

01:02:20
high availability stack,

01:02:21
but it was using a bunch of

01:02:22
proprietary tools.

01:02:24
So when I started

01:02:25
working there, I basically

01:02:26
reworked everything.

01:02:28
And we ended up using standard

01:02:31
stack at the time

01:02:32
was pacemaker with

01:02:34
Corosync and DRBD

01:02:37
for distributed

01:02:38
replicating block device

01:02:40
because we didn't really trust

01:02:41
replication back then

01:02:42
because it was still too new.

01:02:45
And we were actually running

01:02:48
Enterprise DB at the time also.

01:02:50
So there's a

01:02:51
bunch of beta features

01:02:52
they had kind of pushed into 9.2

01:02:54
at the time, I think.

01:02:57
And because of that whole process

01:03:00
and not really having

01:03:01
any kind of guide to follow

01:03:02
because there was not really a lot

01:03:05
of high availability tools

01:03:05
back in 2010, 2011.

01:03:08
So I basically wrote up our stack

01:03:10
and the process I used.

01:03:12
And I presented it to

01:03:14
the second Postgres Open

01:03:15
that was in Chicago.

01:03:20
I did a live demo

01:03:21
of the entire stack.

01:03:22
And that video is

01:03:22
probably online somewhere.

01:03:25
But my slides, I think, are also

01:03:26
on the Postgres Wiki.

01:03:28
But after that, I was

01:03:30
approached by packt,

01:03:32
packtpub the publisher.

01:03:34
And they wanted me

01:03:34
to write a book on it.

01:03:36
So I did.

01:03:37
And I did it

01:03:39
mainly because I'm like,

01:03:40
I didn't have a book to follow.

01:03:42
Somebody else

01:03:42
that's in this position

01:03:43
really needs to have some kind of

01:03:45
series or a book

01:03:47
or some kind of step-by-step thing

01:03:50
because high availability in

01:03:52
Postgres is really important.

01:03:53
You don't want your

01:03:53
database to go down

01:03:54
in a lot of situations.

01:03:58
And until there's a

01:03:58
lot more tools out there

01:03:59
to cover your bases, being able to

01:04:01
do it is important.

01:04:03
Now there's tons of tools for it.

01:04:05
So it's not a big problem.

01:04:06
But back then, man, oof.

01:04:09
Yeah, yeah.

01:04:10
I mean, you just

01:04:11
mentioned Pacemaker.

01:04:12
I'm not sure when I heard that

01:04:14
thing the last time.

01:04:16
Is that even still a thing?

01:04:17
There's still a

01:04:18
couple companies using it.

01:04:19
Yeah, you wouldn't--

01:04:20
You would be surprised.

01:04:21
I think DFW does in

01:04:25
a couple of spots.

01:04:26
All right.

01:04:26
So--

01:04:29
All right.

01:04:29
I haven't heard about that in at

01:04:31
least a decade, I think.

01:04:33
Everything I've

01:04:33
worked with had different--

01:04:35
or let's say other tools, not

01:04:37
different tools.

01:04:39
Wow.

01:04:41
Yeah, cool.

01:04:42
So you wrote that book.

01:04:44
And you said you came from an

01:04:46
oracle world, right?

01:04:47
So how did the transition to

01:04:50
Postgres happen?

01:04:51
Was that a choice?

01:04:55
For me, it wasn't

01:04:56
really much of a transition

01:04:57
because, like I said, our DBA quit

01:05:00
at the company I was at.

01:05:02
And it was right

01:05:03
before a bunch of layoffs

01:05:04
that took out

01:05:05
that entire division.

01:05:08
But at the time, I

01:05:10
was like, ooh, oracle.

01:05:11
I should learn all this stuff.

01:05:13
So the company just had a bunch of

01:05:14
old training materials

01:05:16
lying around.

01:05:17
And there was like three or four

01:05:18
of the huge Oracle books

01:05:20
lying around.

01:05:21
So I spent the next three or four

01:05:23
weeks just reading all

01:05:25
of them back to back.

01:05:26
And I was testing in a cluster

01:05:28
that we had available.

01:05:30
And I set the local

01:05:31
version up in my computer

01:05:32
just to see if it worked and that

01:05:34
and all the stuff

01:05:35
I was trying to learn at the time.

01:05:36
But then the layoffs hit.

01:05:37
So I was like, what do I do now?

01:05:40
And I got another job at a company

01:05:41
that needed a DBA.

01:05:42
And that was MySQL and Postgres.

01:05:45
But that was back when

01:05:46
Postgres was still 6.5.

01:05:49
Back when it crashed if

01:05:51
you looked at it funny.

01:05:54
So I got kind of mad at it.

01:05:55
And I basically stopped using it

01:05:56
from like 2005 to 2010.

01:06:01
Or no, that was--

01:06:02
sorry, from 2001 to 2005.

01:06:06
From 2005, I switched to a company

01:06:08
that they were all Postgres.

01:06:11
So I got the purple Postgres book.

01:06:13
The one that everyone used back

01:06:15
then was I think it was 8.1

01:06:16
or 8.2.

01:06:18
And then I revised

01:06:19
their entire stack also

01:06:22
because they were

01:06:23
having problems with vacuum.

01:06:24
Because back then, the

01:06:26
settings were all wrong.

01:06:28
So you would end up loading

01:06:29
yourself out of your disk space.

01:06:32
I ended up vacuuming their systems

01:06:34
down from I think

01:06:36
it was 20 gigs down to like 5.

01:06:40
And back then, that

01:06:41
was a lot of disk space.

01:06:43
I was just about

01:06:45
to say that in 2005,

01:06:47
20 gigabytes of

01:06:49
disk space was a lot.

01:06:53
But back then, the

01:06:55
problem with vacuum

01:06:55
was you actually had to set the

01:06:57
size of the free space map.

01:07:00
And the default was way too small.

01:07:02
So what would happen is vacuum

01:07:03
would actually only

01:07:04
keep track of the last 200

01:07:07
unused reusable rows

01:07:10
by default.

01:07:11
But by default, it only kept track

01:07:13
of the first 200.

01:07:15
So if you had more

01:07:16
than that, even if you

01:07:17
were vacuuming

01:07:18
constantly, it would still

01:07:20
bloat like a little bit every day

01:07:21
until your whole disk was used.

01:07:24
So I actually had

01:07:25
to clean all that up

01:07:26
or their system

01:07:27
was going to crash.

01:07:28
They were days away from going

01:07:29
down when I joined.

01:07:32
They had already added all the

01:07:34
disks they could.

01:07:35
And back then, you couldn't just

01:07:36
add virtual disk space.

01:07:38
I know those situations, not in

01:07:42
the Postgres or database

01:07:43
space, but in the software

01:07:45
development space where--

01:07:47
same thing, I literally joined

01:07:49
days before it all

01:07:50
would fell apart.

01:07:55
Let's say those are not

01:07:56
the best days to join.

01:07:59
Hey, that's why

01:08:00
they hired you, right?

01:08:02
Exactly.

01:08:03
All right.

01:08:04
So let's talk a little

01:08:07
bit about these days.

01:08:09
Right now, you're with Tembo.

01:08:12
And you just have

01:08:14
this very nice blog post

01:08:17
that blew up on Hacker News for

01:08:19
all the wrong reasons.

01:08:22
Well, I mean, we created it for

01:08:24
all the right reasons.

01:08:26
And so let me just

01:08:28
start on Tembo a little bit.

01:08:29
So Tembo is like they

01:08:33
are all in on Postgres.

01:08:35
We are ridiculously all in.

01:08:38
Basically, everything we

01:08:39
do is all open sourced.

01:08:40
You can go to Tembo.io on GitHub.

01:08:42
And basically, our

01:08:43
entire stack is there.

01:08:46
And we even just

01:08:47
released our on-prem.

01:08:50
So you can actually use our stack

01:08:51
on your local system

01:08:52
and basically have a Kubernetes

01:08:55
cloud management

01:08:56
thing for all the

01:08:57
clusters you want to manage.

01:08:59
And it'll just be

01:09:00
our stack of tools.

01:09:02
And the main calling card of Tembo

01:09:04
is probably our--

01:09:05
if you go to trunk,

01:09:07
I think it's called PGT.dev.

01:09:09
We just keep track of

01:09:10
a bunch of extensions.

01:09:11
And it's got a command line tool

01:09:13
to install them,

01:09:14
kind of like a PGXN.

01:09:17
And we're so kind of into this

01:09:19
that we actually

01:09:20
hired the guy who basically

01:09:22
maintained PGXN, David Wheeler.

01:09:26
Because we were like, we need to

01:09:28
kind of hit the extension

01:09:31
drum.

01:09:32
And we're very glad he's like

01:09:35
re-standardizing PGXN too.

01:09:38
He's starting a whole initiative.

01:09:40
And he's got a lot

01:09:42
of buy-in from tons

01:09:44
of different

01:09:44
committers and devs and people

01:09:47
who are really pushing it.

01:09:49
And we want to get--

01:09:51
maybe we'll create the gold

01:09:52
standard of extension networks.

01:09:55
Because the idea is to get it all

01:09:58
so that it's packaged, right?

01:10:00
Kind of like a Debian or an RPM or

01:10:02
whatever package system

01:10:03
you want to use.

01:10:05
It'll just install the package on

01:10:07
your Postgres wherever it is.

01:10:08
Like the source install, if it's

01:10:09
like a package install,

01:10:11
or if it's something

01:10:12
with on your Mac, whatever.

01:10:16
So he's working on that really.

01:10:17
And he's done some demos that are

01:10:18
very impressive.

01:10:19
And it looks like it'll actually

01:10:20
be a great advancement.

01:10:23
But Tembo is-- it's all about open

01:10:28
source Postgres.

01:10:29
And our tools kind of show that.

01:10:32
Like if you've ever heard of Adam

01:10:34
Hendel, he goes by Chuck.

01:10:36
But if you heard of

01:10:37
PGMQ or PG Vectorize,

01:10:41
which kind of makes PG Vector a

01:10:43
little easier to use,

01:10:45
those tools are all

01:10:46
coming from us, basically.

01:10:49
So we're putting our money where

01:10:51
our mouth is, right?

01:10:57
All right.

01:10:59
That's why I joined him.

01:11:00
Because I kept seeing

01:11:00
them pop up on Twitter.

01:11:01
And I'm like, man,

01:11:02
these guys really--

01:11:03
they're really

01:11:03
dedicated to this whole thing.

01:11:10
Yeah, cool.

01:11:11
So back to PG and

01:11:15
high availability.

01:11:17
Why would I need that?

01:11:19
I mean, I know.

01:11:20
But maybe just give the audience a

01:11:22
little bit of a clue.

01:11:26
So high availability--

01:11:29
and I kind of implied this

01:11:30
when I was talking about the

01:11:32
financial company, right?

01:11:33
The whole idea is to make sure

01:11:34
Postgres never goes down.

01:11:36
But there's so much more to it.

01:11:38
I've done conferences.

01:11:41
And I've done webinars.

01:11:42
And I've done trainings.

01:11:44
And I've done the book.

01:11:47
Just covering that

01:11:48
topic is it's essentially

01:11:50
an infinite font of just all the

01:11:53
different ways you can do it,

01:11:54
all the different prerequisites

01:11:55
you need to fulfill,

01:11:56
all the different

01:11:57
things you need to set up

01:12:00
to make it work properly.

01:12:01
But the whole point is

01:12:02
keep your Postgres up.

01:12:04
But you also have to

01:12:05
define what that means.

01:12:07
Where do you put

01:12:07
your Postgres instances?

01:12:09
Where do you put your replicas?

01:12:11
How do you get to them?

01:12:13
Do you need an intermediate

01:12:15
abstraction layer

01:12:17
so that you can connect to that?

01:12:18
And it'll kind of decide where to

01:12:20
send you afterwards

01:12:21
so you don't have any outages as

01:12:23
far as routing is concerned?

01:12:26
It's a very deep topic.

01:12:29
And it's easy to get wrong.

01:12:32
And a lot of the tools out there,

01:12:35
they don't

01:12:35
necessarily get it wrong.

01:12:36
But they expect the

01:12:37
user to get it right.

01:12:40
One of the reasons my book did so

01:12:42
well in certain circles

01:12:43
is because if you want to set up

01:12:45
EFM or repmgr or Patroni

01:12:48
or some other tool, you have to

01:12:51
follow very closely

01:12:53
and know how the tool

01:12:53
works extremely well.

01:12:55
You have to be very familiar with

01:12:57
the documentation.

01:12:58
You can't just follow step by step

01:13:00
and then expect it to

01:13:01
work in a lot of cases.

01:13:02
Now, there's a lot of edge cases

01:13:03
you have to account for.

01:13:04
You have to know

01:13:05
why and the theories

01:13:06
behind the high availability and

01:13:08
how it works a certain way

01:13:11
to really deploy it properly.

01:13:13
So even as a consultant

01:13:16
when I was working at EDB

01:13:18
and a second quadrant, it's easy

01:13:22
to give a stack to a customer

01:13:25
and they can implement it with

01:13:27
your recommendations.

01:13:28
And you can even

01:13:28
set it up for them.

01:13:29
There's always some kind of edge

01:13:30
case that you didn't think of.

01:13:33
So the issue with Postgres, in

01:13:37
kind of my opinion,

01:13:38
is it gives you a lot of tools to

01:13:40
build it yourself,

01:13:42
but it expects you

01:13:42
to build it yourself.

01:13:44
And even the other stack tools,

01:13:45
like I had mentioned earlier,

01:13:46
like repmgr

01:13:46
or EFM or Patroni,

01:13:50
those are pg_auto_failover,

01:13:52
another one that came out

01:13:53
recently.

01:13:55
They work, but you've

01:13:57
got to install them.

01:13:58
And you really do need access to

01:14:00
an expert that can come in

01:14:01
if something goes wrong.

01:14:03
Because if something goes wrong,

01:14:04
you're kind of on your own

01:14:05
in a lot of ways.

01:14:06
Postgres doesn't really have an

01:14:07
inherent integral way

01:14:09
of managing itself as a cluster.

01:14:12
It's more of like a

01:14:13
database that just happens

01:14:14
to be able to talk to other nodes

01:14:15
to keep them up to date

01:14:17
with sync and whatnot.

01:14:20
So it's important, but it's also

01:14:23
hard to do right.

01:14:25
I think you mentioned one

01:14:27
important thing.

01:14:27
It is important to

01:14:29
upfront define your goals.

01:14:34
How much uptime

01:14:36
do you really need?

01:14:38
Because one thing that not only

01:14:40
with Postgres, but in general,

01:14:42
whenever we talk about failure

01:14:45
tolerance systems,

01:14:47
high availability, all

01:14:48
those kinds of things,

01:14:49
what a lot of

01:14:50
people seem to forget

01:14:52
is that high

01:14:53
availability or fault tolerance

01:14:56
is a trade-off between how much

01:14:58
time and money do I invest

01:15:01
and how much money do I lose if

01:15:03
something really, well,

01:15:05
you could say,

01:15:06
s***t hits the fan, right?

01:15:08
Exactly.

01:15:09
And that's the thing.

01:15:11
Companies like the financial

01:15:12
company I worked at,

01:15:13
they took high

01:15:14
availability to a fault.

01:15:17
They had two systems in

01:15:20
their main data center.

01:15:21
They had two systems in their

01:15:22
disaster recovery data center

01:15:24
that were fully

01:15:24
synced and up to date.

01:15:26
They had backups that were on both

01:15:29
local systems taken

01:15:30
every day that was also

01:15:31
shipped to a system that

01:15:33
had seven days worth locally.

01:15:35
And that was sent to tape, which

01:15:37
was then sent to Glacier,

01:15:40
which according to SEC rules, they

01:15:41
had to keep for seven years.

01:15:42
So someone could

01:15:44
come into our systems

01:15:46
and maliciously erase

01:15:47
literally everything,

01:15:48
and we'd be back up in an hour.

01:15:51
It was very resilient.

01:15:53
And part of that was our design

01:15:55
and the amount of money

01:15:57
we dedicated

01:15:57
toward it because that

01:15:58
was a very expensive

01:15:59
deployment because that's

01:16:01
at least 10 servers right there.

01:16:05
But also, when you say you could

01:16:07
be back up in an hour,

01:16:10
the question is, how much money do

01:16:11
you lose in that hour still?

01:16:14
Well, like I said, that was like

01:16:15
someone actually walks in

01:16:17
and literally

01:16:17
smashes all the servers.

01:16:19
We have to go from a backup and

01:16:22
actually rebuild everything

01:16:22
from scratch.

01:16:24
In most cases, we'd be up--

01:16:26
and this is where

01:16:26
your RTO and RPO come in,

01:16:29
the recovery time objective and

01:16:30
your recovery point objective.

01:16:32
Basically, how much do

01:16:33
you want to spend to say,

01:16:35
I want to be down

01:16:36
for one minute or less?

01:16:37
Or if I am down

01:16:39
for that one minute,

01:16:40
how much data will I lose?

01:16:42
Because the amount

01:16:43
of money you spend

01:16:43
or the amount of resources you

01:16:44
dedicate toward that thing

01:16:46
will determine the end result of

01:16:48
how much data you might lose

01:16:49
or how much money you'll need to

01:16:52
spend to make sure you're

01:16:52
down for less than a minute.

01:16:55
That kind of thing.

01:16:56
I think that becomes more

01:16:57
important in the cloud age.

01:17:00
So perfect bridge to cloud,

01:17:02
Postgres and cloud, perfect.

01:17:04
So you said setting

01:17:05
up HA is complicated

01:17:07
because you have to

01:17:08
install the tools.

01:17:09
You have to configure them.

01:17:11
These days, when you

01:17:12
go and deploy Postgres

01:17:15
on something like Kubernetes, you

01:17:17
would have an operator

01:17:18
claiming at least doing

01:17:19
all the magic for you.

01:17:21
What is your opinion on the magic?

01:17:23
Yeah, so my opinion on

01:17:26
that is it evolved a lot.

01:17:27
Back when I first started seeing

01:17:29
containerized systems

01:17:30
like Docker and that kind of

01:17:33
thing, my opinion was,

01:17:36
I don't know if I'd run a

01:17:37
production system

01:17:38
in a container, right?

01:17:39
Because it just

01:17:39
seems a little shady.

01:17:41
But that was 10 years ago or more.

01:17:46
Now that Kubernetes

01:17:47
tools and that kind of thing

01:17:48
have matured a lot, what

01:17:50
you get out of this now

01:17:51
is you get a level of automation

01:17:53
that just is not

01:17:54
possible using

01:17:55
pretty much anything else.

01:17:59
And I think what

01:18:00
really sold it to me was--

01:18:02
so you may have heard

01:18:04
of Gabriele Bartolini.

01:18:06
He's basically heads up the team

01:18:09
that writes and maintains

01:18:13
Cloud Native Postgres, the Cloud

01:18:15
Native PG operator.

01:18:17
So we'll talk about operators

01:18:18
probably a bit later.

01:18:19
But the point of

01:18:21
that was back when--

01:18:23
and a 2ndQuadrant was before

01:18:25
they were bought by EDB,

01:18:27
we were selling our BDR tool for

01:18:31
bi-directional application

01:18:33
for Postgres, right?

01:18:34
So multi-master.

01:18:36
And we needed a way to

01:18:39
put that in a Cloud service

01:18:41
for obvious purposes so we could

01:18:43
sell it to customers.

01:18:45
And that meant we

01:18:45
needed an operator.

01:18:47
Well, before Cloud

01:18:50
Native Postgres existed,

01:18:51
there was the BDR

01:18:53
operator that we were circling,

01:18:55
cycling internally for customers.

01:18:58
And one day while

01:19:01
we were in Italy--

01:19:02
because every employee who worked

01:19:04
at 2ndQuadrant

01:19:05
got sent to Italy

01:19:06
for a couple of weeks

01:19:07
to get oriented with the team,

01:19:09
that kind of thing.

01:19:10
And during that time

01:19:11
when I was there in 2020,

01:19:14
I think I was there for February,

01:19:15
for the first two

01:19:16
weeks of February.

01:19:17
He demoed that.

01:19:20
And it kind of blew me away.

01:19:23
We were using other tools to

01:19:25
deploy containers.

01:19:27
And it basically Ansible to

01:19:30
automate the deployment

01:19:32
with Terraform.

01:19:33
And then you kind of set

01:19:34
everything up and then

01:19:35
deploy everything.

01:19:36
And it takes minutes to

01:19:39
set up all the packages

01:19:40
and get everything deployed and

01:19:42
reconfigure everything.

01:19:43
And then you have to wait for

01:19:44
syncs and whatnot

01:19:46
make sure everything's proper.

01:19:48
On someone's laptop, they set up

01:19:51
Kubernetes Docker deployment.

01:19:53
Kind, I think we were

01:19:54
using at that point,

01:19:56
Kubernetes in Docker.

01:19:58
And in less than a

01:20:01
minute, he had on his laptop

01:20:04
set up a full Kubernetes cluster

01:20:06
of three replicating,

01:20:09
bidirectional replicating, so

01:20:10
three multi-master nodes

01:20:12
of Postgres on his

01:20:13
laptop in less than a minute.

01:20:15
And I was just

01:20:16
like, my mind was blown.

01:20:19
And the thing is,

01:20:21
basically, it's a new concept.

01:20:24
The data is what matters.

01:20:26
The nodes themselves are

01:20:28
completely unimportant.

01:20:30
And that's why to kind of bring

01:20:32
this back around,

01:20:34
when Cloud Native Postgres was

01:20:35
released by Enterprise DB

01:20:37
kind of as an open

01:20:39
source tool for Postgres

01:20:41
and not the

01:20:41
bidirectional replication

01:20:42
stuff for just Postgres.

01:20:45
The reason that was important was

01:20:46
because it's an ethos.

01:20:49
The point is your compute nodes

01:20:52
throw them away.

01:20:53
They don't matter.

01:20:54
If one goes down, you

01:20:55
provision a new one.

01:20:57
If you need to upgrade your

01:20:59
tooling or the packages,

01:21:02
you throw away the

01:21:03
old container image,

01:21:05
you bring up a new one.

01:21:06
The important part is your data.

01:21:08
And as long as your data is on

01:21:10
your persistent volume claim

01:21:12
or whatever you provision that as,

01:21:15
the container itself, the version

01:21:17
of Postgres you're

01:21:18
running, those aren't

01:21:19
nearly as important.

01:21:21
So it complicates

01:21:24
debugging to a certain extent.

01:21:26
And we can kind of talk

01:21:27
about that maybe later.

01:21:28
But the important part is it

01:21:31
brings high availability

01:21:32
to a level that can't really be

01:21:34
described using the old methods.

01:21:36
Because the old method was you

01:21:39
create two or three replicas.

01:21:42
And if one goes down, you've got a

01:21:43
monitoring system

01:21:44
that switches over to

01:21:44
one of the alternates.

01:21:46
And then the other one might come

01:21:48
back or might not.

01:21:49
And then you rebuild it if it

01:21:50
does, that kind of thing.

01:21:52
With the Kubernetes approach or

01:21:54
the container approach,

01:21:55
as long as your

01:21:56
storage wasn't corrupted,

01:21:59
you can just bring

01:22:00
up a new container

01:22:00
to represent that storage.

01:22:02
And you can

01:22:03
actually have a situation

01:22:04
where the primary goes down

01:22:05
because maybe it

01:22:08
got OOM killed for some reason.

01:22:11
It can actually go down, get a new

01:22:13
container provisioned,

01:22:14
and come back up

01:22:15
before the monitors even

01:22:17
notice that there was an outage

01:22:19
and the switch to a replica

01:22:20
and promote it.

01:22:22
There's a whole

01:22:24
mechanism of systems

01:22:25
in there to kind of reduce the

01:22:27
amount of timeline switches

01:22:28
and other kind of complications

01:22:30
behind the scenes.

01:22:31
So you have a

01:22:32
cohesive, stable timeline.

01:22:37
You maximize your uptime.

01:22:39
They've got layers to

01:22:40
redirect connections

01:22:42
from the outside world

01:22:43
through either traffic

01:22:45
or some other kind of proxy to get

01:22:51
into your actual cluster.

01:22:53
You always get an

01:22:54
endpoint somehow.

01:22:56
And that's something

01:22:57
that was horribly wrong,

01:22:58
but that's true for anything.

01:23:00
But the ethos of your machines

01:23:03
aren't important.

01:23:05
It spoke to me a little bit

01:23:06
because it brings you

01:23:07
to a level that sure, their

01:23:09
hardware is great.

01:23:11
And I actually prefer it.

01:23:12
I've got servers in my

01:23:13
basement specifically

01:23:14
for testing clusters

01:23:16
and Postgres and whatnot.

01:23:18
But if you have the

01:23:20
luxury of provisioning

01:23:22
what you need at the time, if I

01:23:26
want more compute nodes,

01:23:29
like I said, show my image, bring

01:23:30
up a new one that's

01:23:31
got more

01:23:32
resources allocated to it,

01:23:33
suddenly I've grown vertically.

01:23:36
And that's something you can't

01:23:38
really do with bare hardware,

01:23:39
at least not very easily.

01:23:42
So then I was like, well, maybe

01:23:43
this whole container thing

01:23:43
isn't really a problem, right?

01:23:46
So yeah, it's all because of my

01:23:49
time in 2ndQuadrant

01:23:50
and Gabriele's team that high

01:23:54
availability does

01:23:55
belong in the cloud.

01:23:56
And you can run production in the

01:23:58
cloud on Kubernetes

01:23:59
and containers.

01:24:01
And in fact, I encourage it.

01:24:03
I love that.

01:24:03
I love that.

01:24:04
I also think high

01:24:05
availability in cloud,

01:24:07
and especially cloud

01:24:08
native are concepts

01:24:09
that are perfectly in line and

01:24:11
perfectly in sync.

01:24:13
Unfortunately, we're out of time.

01:24:16
I didn't want to

01:24:16
stop you, but I think

01:24:18
you have to invite you again and

01:24:21
keep talking about that.

01:24:23
But one last question.

01:24:25
One last question.

01:24:25
By the way, I love when

01:24:28
you said that containers

01:24:29
were a new thing

01:24:30
like 10 years ago,

01:24:31
except for you came from the

01:24:32
Solaris or BSD world where

01:24:34
those things were--

01:24:35
Jails!

01:24:36
But it's still different, right?

01:24:39
You didn't have this

01:24:40
orchestration layer on top.

01:24:41
The whole ecosystem evolved very

01:24:43
differently in the Linux space.

01:24:45
Anyway, last question.

01:24:50
What do you think is

01:24:50
the next big thing?

01:24:51
What is upcoming

01:24:53
in the Postgres,

01:24:54
but Linux, the

01:24:55
container world, what do you

01:24:56
think is amazing on the horizon?

01:25:01
I mean, I hate to be cliche here,

01:25:04
but it's got to be AI.

01:25:07
If you look at pgvector

01:25:09
it's basically

01:25:12
allowing you to do vectorized

01:25:13
similar researches right

01:25:15
in Postgres.

01:25:17
And I think Timescale even

01:25:18
released pgvectorscale,

01:25:21
which is an extension that makes

01:25:22
pgvector even better.

01:25:24
It makes it apparently faster than

01:25:25
dedicated vector

01:25:26
databases like Pinecone.

01:25:29
And it's just an

01:25:31
area that if you're

01:25:32
going to do any kind of result,

01:25:35
augmented generation,

01:25:36
like RAG searches, or if you're

01:25:39
doing any LLM work at all,

01:25:42
if you're building

01:25:42
chatbots, or if you're just

01:25:44
doing, like I said,

01:25:46
augmented searches,

01:25:47
any of that kind of

01:25:47
work, you're going

01:25:49
to be wanting your data that's in

01:25:51
Postgres already, right?

01:25:52
You're going to want to make that

01:25:54
available to your AI.

01:25:57
And the easiest way to

01:25:58
do that is with pgvector.

01:26:00
Tembo even wrote an extension we

01:26:02
call pg_vectorize,

01:26:03
which automatically maintains your

01:26:05
embeddings, which

01:26:06
is how you kind of interface your

01:26:08
searches with the text.

01:26:09
And then you can feed

01:26:10
that back into an LLM.

01:26:11
It also has the

01:26:13
ability to do that for you.

01:26:14
Like it can send

01:26:15
messages directly to OpenAI.

01:26:17
We can also interface

01:26:18
with arbitrary paths

01:26:20
so you can set up an Ollama or

01:26:23
something on a server

01:26:25
or locally.

01:26:26
And then you can set

01:26:27
that to be the end target.

01:26:28
So you can even keep your messages

01:26:30
from hitting external resources

01:26:33
like Microsoft or OpenAI

01:26:35
or whatever, just

01:26:36
do it all locally.

01:26:39
And that's all very important.

01:26:41
So that I think is going to be--

01:26:44
it's whatever

01:26:44
one-- not either one,

01:26:45
but a lot of people

01:26:46
are focusing on it.

01:26:47
And a lot of

01:26:47
people find it annoying.

01:26:48
It's another AI thing, right?

01:26:50
But I wrote two blog posts on this

01:26:52
where I wrote a RAG app using some

01:26:56
Python and pgvector.

01:26:58
And then I wrote a second one

01:26:59
where I used pg_vectorize

01:27:02
and I cut my

01:27:03
Python code by like 90%.

01:27:05
And it just

01:27:06
basically talks to Postgres.

01:27:07
Postgres is doing it all.

01:27:09
And that's because of the

01:27:10
extension ecosystem, right?

01:27:12
And that's one of

01:27:12
the reasons Postgres

01:27:13
is kind of on the top of

01:27:15
everyone's mind right now

01:27:15
because it's leading the charge.

01:27:18
And it's bringing a lot

01:27:19
of people in that may not

01:27:21
have been interested before.

01:27:23
I love that.

01:27:24
And I think that's a perfect

01:27:25
sentence to end the show.

01:27:30
The Postgres ecosystem or

01:27:32
extension system

01:27:33
is just incredible.

01:27:34
And there's so much stuff that

01:27:36
we've seen so far

01:27:37
and so much more stuff to come.

01:27:40
I couldn't agree more.

01:27:42
Yeah, it's just

01:27:42
the beginning, man.

01:27:44
Yeah, let's hope

01:27:44
that AI is not going

01:27:46
to try to build our HA systems.

01:27:49
And I'm happy.

01:27:52
Maybe not yet, yeah.

01:27:54
Yeah, not yet at least.

01:27:55
Exactly.

01:27:56
All right, thank

01:27:57
you for being here.

01:27:58
It was a pleasure.

01:28:00
As I said, I think I have to

01:28:01
invite you again somewhere

01:28:02
in the future.

01:28:03
More than willing.

01:28:07
And to the audience, thank you for

01:28:09
listening in again.

01:28:11
I hope you come back next week.

01:28:13
And thank you very much.

01:28:16
Take care.

01:28:18
The cloud commute podcast is sponsored by

01:28:20
simplyblock your own elastic

01:28:21
block storage engine for the cloud.

01:28:23
Get higher IOPS and low predictable

01:28:25
latency while bringing down your

01:28:26
total cost of ownership.

01:28:28
www.simplyblock.io

How I designed PostgreSQL High Availability with Shaun Thomas from Tembo

Cloud Frontier

Cloud Commute