PostgreSQL mistakes and how to avoid them - Jimmy Angelakos
Cloud CommuteMay 03, 2024x
10
00:21:3719.8 MB

PostgreSQL mistakes and how to avoid them - Jimmy Angelakos

Jimmy Angelakos, a freelance consultant, talks about his experiences with customers running PostgreSQL on bare-metal, in the cloud, and on Kubernetes. He also talks about his new book "PostgreSQL Mistakes and How to Avoid Them" (discount link below).

If you have questions to Jimmy, you can reach him here:

Additional show notes:

The Cloud Commute Podcast is presented by simplyblock (https://www.simplyblock.io)


01:00:00
That has anti-patterns that are

01:00:03
not really restricted to Postgres.

01:00:05
It's things like, don't improvise,

01:00:09
don't create your own

01:00:10
distributed systems.

01:00:12
There's people that have spent

01:00:14
hundreds of thousands of hours

01:00:17
working on these problems, and you

01:00:19
don't need to reinvent the wheel.

01:00:25
You're listening to simplyblock's Cloud Commute Podcast,

01:00:27
your weekly 20 minute

01:00:28
podcast about cloud technologies,

01:00:30
Kubernetes, security,

01:00:32
sustainability, and more.

01:00:34
Welcome back everyone.

01:00:35
Welcome to the next episode of

01:00:37
simplyblock's

01:00:37
Cloud Commute podcast.

01:00:39
Today I have a very

01:00:41
interesting guest,

01:00:41
very different from

01:00:42
the other ones before,

01:00:45
because he's actually an author,

01:00:47
writing a book right now.

01:00:50
Well, I think he already published

01:00:51
one or two at least.

01:00:54
But he'll going to

01:00:55
talk about that himself.

01:00:57
Welcome, Jimmy.

01:00:59
Hi, very nice to be here.

01:01:02
Very nice. Thank

01:01:04
you for being here.

01:01:06
Maybe we just start simple with

01:01:09
the basic stuff.

01:01:10
Who are you?

01:01:11
Where are you from?

01:01:12
What do you do for a

01:01:14
living? Except writing a book.

01:01:18
My name is Jimmy Angelakos,

01:01:21
which is obviously a Greek name.

01:01:24
I live in Edinburgh in Scotland.

01:01:28
I've been working with Postgres

01:01:30
for maybe around 16

01:01:33
years now, exclusively.

01:01:35
I haven't used any other database

01:01:36
in 16 years in a

01:01:38
professional capacity.

01:01:41
Naturally, the time came to share

01:01:46
my experiences and I wrote a

01:01:47
couple of books on this.

01:01:48
Well, I actually co-wrote the

01:01:53
PostgreSQL16

01:01:54
administration cookbook

01:01:56
with my lovely

01:01:58
co-authors Boris Mejías,

01:02:02
Gianni Ciolli, Vibhor Kumar,

01:02:05
and the sadly

01:02:07
departed Simon Riggs,

01:02:11
who was an awesome fellow.

01:02:14
I'd like to pay a little tribute

01:02:17
to him as a person,

01:02:20
as a mentor to the

01:02:21
entire Postgres community.

01:02:23
He will be greatly missed.

01:02:28
Thank you very much.

01:02:29
I appreciate you sharing that

01:02:31
because I think it was last week

01:02:34
at the time of recording.

01:02:37
It is a sad story for the Postgres

01:02:41
community as a whole.

01:02:42
Thank you for sharing that.

01:02:46
From your professional life,

01:02:47
for the last couple of years next

01:02:49
to writing books,

01:02:50
I think you're mostly working as a

01:02:51
consultant with a couple of

01:02:53
different companies and customers.

01:02:56
What do you think is

01:02:57
the most common task?

01:02:59
I mean, you're probably coming in

01:03:01
helping them optimize Postgres,

01:03:04
optimize queries.

01:03:05
Right. I've done all sorts of

01:03:07
things in the past few years,

01:03:08
like training customers to use

01:03:12
Postgres in general,

01:03:14
training them to use Postgres in a

01:03:16
specific way that is

01:03:18
suited to their needs.

01:03:21
I have provided support to

01:03:25
customers who ran Postgres,

01:03:27
and also professional services

01:03:28
like consulting.

01:03:31
I can't really say what the thing

01:03:33
they use the most is or they

01:03:36
request the most,

01:03:37
but I can tell you

01:03:40
a few of the things.

01:03:42
Some customers come in and say,

01:03:44
"My queries aren't

01:03:45
running well. What can I do?"

01:03:47
It's like the most

01:03:49
frequent thing you hear.

01:03:52
Some other people say,

01:03:54
"Tell me what hardware

01:03:56
to buy for Postgres."

01:03:57
You tell them, "I can't really

01:04:00
give you a response because it

01:04:02
really depends on your workload,"

01:04:04
which is the most important

01:04:06
factor, I think, with databases.

01:04:08
Everyone uses them differently.

01:04:09
If it's a database that is widely

01:04:12
used as Postgres

01:04:13
with so many use cases

01:04:15
and so many

01:04:16
different ways to use it,

01:04:19
like you can do analytics on it.

01:04:21
To an extent, you can use it for

01:04:23
transaction processing,

01:04:25
(OLTP), you can use it as a

01:04:28
document database,

01:04:32
with JSONB. There's all sorts of

01:04:33
things you can do.

01:04:34
There's no good answer to the

01:04:36
things that people ask like,

01:04:39
"Give me the best tuning

01:04:40
parameters for Postgres,"

01:04:42
or "How to write a

01:04:45
query the right way."

01:04:46
It really depends on the

01:04:48
amount of data you have,

01:04:50
the type of data you have,

01:04:51
and the sort of queries you're

01:04:52
going to be running.

01:04:54
Yeah, that makes a lot of sense.

01:04:56
It's not only for the Postgres

01:04:58
community or for Postgres.

01:05:00
That is very true

01:05:01
for a lot of things.

01:05:04
From my own personal background,

01:05:07
with a lot of programming

01:05:07
languages or runtime environments,

01:05:09
people ask, "What is the

01:05:10
optimized or the optimal way of

01:05:13
configuring it?"

01:05:14
And they're like, "I don't know.

01:05:16
Can't give you the answer."

01:05:18
So, yeah, I hear

01:05:19
where you're coming from.

01:05:22
All right, so...

01:05:24
Sorry, I'm still having

01:05:25
a little bit of a flu.

01:05:28
So, from your personal background,

01:05:33
you said you've

01:05:34
co-written one book,

01:05:35
but I also hinted on the fact that

01:05:38
you're writing

01:05:39
another book right now,

01:05:40
and I looked a little bit into it

01:05:43
because it's on Manning

01:05:44
and it has Early

01:05:45
Access, which is nice.

01:05:46
But maybe you can give us a little

01:05:48
bit of an insight

01:05:49
of what you're writing about.

01:05:51
Right. So, the book

01:05:54
that is under construction

01:05:56
is called PostgreSQL Mistakes

01:05:59
and how you can avoid them.

01:06:02
So, it's a bit of an anti-how-to.

01:06:05
So, for people that

01:06:06
are used to how-to books,

01:06:08
like, "How do I partition? How do

01:06:10
I do this? How do I do that?"

01:06:12
It's a bit of the

01:06:13
other way around.

01:06:14
I was trying to do this, but

01:06:17
things went wrong.

01:06:19
So, it's experiences

01:06:20
that I've collected

01:06:22
from the things I've

01:06:23
seen our customers do

01:06:26
or the things I've

01:06:27
done in the past.

01:06:29
Right.

01:06:30
And it's really

01:06:31
important to learn from mistakes.

01:06:35
Everyone makes mistakes.

01:06:37
And Postgres is very particular in

01:06:39
how it wants things done.

01:06:43
So, if you get it right, the

01:06:46
database is fantastic.

01:06:48
It works very well with

01:06:50
excellent performance.

01:06:51
And when you start to do

01:06:56
things a different way,

01:06:57
you can see different results.

01:07:01
And that's

01:07:01
basically the whole idea.

01:07:03
There's three chapters.

01:07:05
Three chapters up on the web now.

01:07:08
And there's a huge fourth chapter

01:07:10
that's being

01:07:11
published as we speak.

01:07:16
That has anti-patterns that are

01:07:19
not really restricted to Postgres.

01:07:21
It's things like, don't improvise,

01:07:25
don't create your own

01:07:26
distributed systems.

01:07:28
There's people that have spent

01:07:30
hundreds of thousands of hours

01:07:33
working on these problems, and you

01:07:34
don't need to reinvent the wheel.

01:07:38
I hear you.

01:07:38
As you said, there's three

01:07:40
chapters out right now.

01:07:42
I haven't seen the fourth one yet,

01:07:43
so I think I have to look into

01:07:45
that right after the recording.

01:07:47
Manning are in the process of

01:07:49
publishing it as we speak.

01:07:49
All right, cool.

01:07:51
But so far, I really

01:07:53
like the second chapter

01:07:56
and you bringing up all

01:07:57
of the SQL code examples

01:08:00
and showing the execution plans.

01:08:02
And I think just by saying the

01:08:04
word execution plan

01:08:05
or the term execution plan,

01:08:06
I probably lost half of the

01:08:08
audience right now.

01:08:09
So maybe you can give them a

01:08:10
little bit of a feeling of what is

01:08:12
an execution plan?

01:08:13
Why is it so important to

01:08:14
understand those things?

01:08:15
Yeah, so Postgres has a

01:08:17
quasi-intelligent query planner,

01:08:22
which basically examines the way

01:08:26
your query is written

01:08:28
and produces a plan on how it's

01:08:31
going to get executed by the

01:08:33
database server.

01:08:34
It's like, oh, they wrote this,

01:08:37
where, this, and that,

01:08:40
and it looks like a join.

01:08:42
So I'm going to perform

01:08:43
a join of these tables

01:08:45
and then I'm going to order the

01:08:47
results in this.

01:08:48
So that's the execution plan.

01:08:50
It's basically telling you how the

01:08:51
database is going to

01:08:52
execute your SQL query.

01:08:56
Now, the planner takes into

01:08:58
account things such as how much

01:09:00
memory do you have

01:09:01
or how fast are your disks that

01:09:04
you've already specified in the

01:09:06
Postgres configuration.

01:09:07
It also takes into account things

01:09:09
like what's the

01:09:10
nature of the data?

01:09:12
What's the cardinality,

01:09:13
let's say, in your tables?

01:09:16
And these are things that are

01:09:17
updated

01:09:19
automatically by Postgres itself

01:09:21
in its statistics tables.

01:09:23
So it produces, most of the time,

01:09:25
a really good plan.

01:09:28
And what is a good plan?

01:09:29
It's the cheapest plan in

01:09:31
terms of arbitrary cost.

01:09:34
And arbitrary cost is calculated

01:09:37
using those factors

01:09:39
that I just mentioned.

01:09:41
And it iterates through many plans

01:09:45
for the execution,

01:09:47
chooses the cheapest one, which

01:09:48
will probably end up

01:09:49
being the fastest one

01:09:51
to execute in real world terms.

01:09:55
And seeing the execution plans is

01:09:58
key to understand why your queries

01:10:00
are running well

01:10:01
or why they're running slowly.

01:10:04
Because then you can see, ah, this

01:10:06
is what Postgres was trying to do.

01:10:08
So maybe I should force its hand

01:10:10
by writing this

01:10:11
slightly differently.

01:10:14
Yeah, that's true.

01:10:16
I think my personal favorite

01:10:18
example is a

01:10:20
common table expression,

01:10:22
which ends up being a join because

01:10:24
the query planner understands now

01:10:25
a join is actually better.

01:10:26
I don't need to do the temporary

01:10:29
heap table to store the

01:10:32
intermediate result.

01:10:35
Cool, yeah.

01:10:36
So we kind of hinted where people

01:10:38
can find the early access version.

01:10:41
It's at Manning.

01:10:42
Do you want to add

01:10:43
anything more to that?

01:10:46
Maybe have a specific simple URL

01:10:50
or something where

01:10:51
people can find it.

01:10:54
I can share the URL, but I

01:10:55
certainly cannot spell it out.

01:10:58
OK, that's fair enough.

01:10:59
We're going to put

01:10:59
it in the show notes.

01:11:01
That's totally fine.

01:11:02
Thanks very much.

01:11:04
Yeah, I think it's going to be an

01:11:05
interesting book because it's real

01:11:08
world use cases.

01:11:10
And where it isn't a real world

01:11:13
use case, it's close enough.

01:11:16
And I will tell

01:11:17
you so in the text.

01:11:19
That is true.

01:11:20
And I agree.

01:11:20
As I said, I've well kind of read

01:11:23
through the first three.

01:11:25
I read as much as I had time, but

01:11:28
I really enjoyed it.

01:11:30
And many of those code examples

01:11:33
you brought up, as I said,

01:11:34
especially in the second chapter,

01:11:36
they were like, yes, either I've

01:11:38
been there or I had people helping

01:11:40
with that as well.

01:11:42
I've worked for a Postgres based

01:11:45
startup in the past.

01:11:47
And we had people asking pretty

01:11:49
much the same

01:11:50
questions over and over again.

01:11:52
So yes, for everyone using

01:11:54
Postgres or starting using

01:11:55
Postgres, it's probably a pretty,

01:11:57
pretty good pick.

01:11:59
Thank you.

01:11:59
I appreciate that.

01:12:01
Yeah, as you know, people are

01:12:03
familiar with other databases

01:12:05
because Postgres has most recently

01:12:10
exploded in popularity.

01:12:12
It was kind of a niche database

01:12:13
for a few years.

01:12:15
And now it looks like all the

01:12:17
enterprises are using it, all the

01:12:19
hyperscalers are starting to use

01:12:21
it, like AWS, Google, Azure.

01:12:27
This means that they have

01:12:29
recognized the value that Postgres

01:12:32
brings to the table.

01:12:34
Yeah, I agree.

01:12:35
And I think it's kind of

01:12:36
interesting because you kind of

01:12:37
hinted at that earlier.

01:12:39
But you can do a lot

01:12:40
of things with Postgres.

01:12:42
There is a lot of stuff in

01:12:44
Postgres itself.

01:12:45
If you want document database, you

01:12:47
have XML and JSON.

01:12:50
If you want key

01:12:51
value, you have hstore.

01:12:53
But there is also a really good

01:12:56
extensibility to Postgres, giving

01:12:58
you the chance to

01:12:59
plug everything else in,

01:13:00
like time series, graph databases.

01:13:03
I don't know what else.

01:13:04
You probably could define Postgres

01:13:08
as the actual really only in the

01:13:10
world multimodal database.

01:13:13
Right, yeah.

01:13:14
And we were actually considering

01:13:15
of changing the description of

01:13:17
Postgres on the website,

01:13:18
where you go in and it says it's

01:13:20
an object relational database,

01:13:22
which is kind of a formal,

01:13:23
traditional way to put it.

01:13:26
But nowadays, you're right.

01:13:28
I think it's more of

01:13:29
a multimodal database.

01:13:31
And I think that is also the term

01:13:33
that Simon Riggs preferred.

01:13:38
Because it does all of these

01:13:40
things and can also let you do

01:13:42
things that the developers of

01:13:45
Postgres hadn't even thought of,

01:13:47
because of the extension system.

01:13:49
Like a very famous extension is

01:13:51
PostGIS, which is the GIS (geospatial)

01:13:55
capabilities for Postgres.

01:13:59
And is now considered the gold

01:14:01
standard in

01:14:02
geographical databases.

01:14:04
True.

01:14:05
From an open source extension to

01:14:07
an open source database.

01:14:09
And there's like thousands of

01:14:11
people that are professionally

01:14:14
employed to use this extension in

01:14:17
their day jobs, which is amazing.

01:14:20
True.

01:14:21
I agree.

01:14:23
So let me see.

01:14:25
Let me flip back a little bit.

01:14:27
I mean, we're

01:14:27
officially a cloud podcast.

01:14:29
We talked a lot about

01:14:30
the cool Postgres world.

01:14:32
And I was part of

01:14:33
a Postgres world.

01:14:34
I was part of the Java world.

01:14:35
So that is mostly the

01:14:37
guests I had so far.

01:14:39
But because we're a cloud podcast,

01:14:41
what do you think, like working

01:14:43
with all the different customers,

01:14:46
what is your feeling?

01:14:47
Like how many people are actually

01:14:48
deploying Postgres in the cloud,

01:14:51
in Kubernetes, in EC2,

01:14:54
or anything like that?

01:14:58
Well, the company I'm working with

01:15:00
right now are using it on RDS.

01:15:07
They're using RDS Postgres because

01:15:10
it suits their use

01:15:12
case better in the sense

01:15:13
that they don't have a team that

01:15:17
wants to worry about replication

01:15:20
and backups and things like that.

01:15:23
And availability zones, they want

01:15:25
that handled as a service.

01:15:29
And that fits their

01:15:31
use case quite well.

01:15:32
When you want more flexibility,

01:15:34
you can still use the cloud.

01:15:37
You can run, for example, Postgres

01:15:40
on Azure boxes or EC2 boxes or

01:15:43
whatever you want.

01:15:45
But then you have to take care of

01:15:47
these things yourself.

01:15:51
Right.

01:15:52
But it still gives you freedom

01:15:55
from having to worry about hard

01:15:57
drives and hardware and purchase

01:16:01
orders and things like that.

01:16:03
You just send off a check every

01:16:05
month and you're done.

01:16:09
Now, Kubernetes is

01:16:11
an interesting case.

01:16:13
There's a couple of

01:16:14
operators for Postgres.

01:16:16
The most recent one is Cloud

01:16:18
Native PG, which is starting to

01:16:24
get supported and getting traction

01:16:26
from the Cloud Native Computing

01:16:28
Foundation, which is great.

01:16:32
And they are trying to do things

01:16:36
in a different way that

01:16:37
is totally cloud native.

01:16:40
So everything is defined as a

01:16:42
resource in Kubernetes.

01:16:45
But the resources map to things

01:16:48
that are well known in Postgres,

01:16:50
like clusters and nodes and

01:16:52
backups and actual things so that

01:16:56
you don't have to perform black

01:17:00
magic like running it in a pod,

01:17:02
but also having to configure the

01:17:05
pod manually to talk to another

01:17:07
pod that is your

01:17:08
replicant, things like that.

01:17:11
And there are other operators that

01:17:13
have evolved over time to

01:17:15
approximate this ease of use.

01:17:21
I think the Crunchy Data

01:17:23
Operator comes to mind.

01:17:25
It started off

01:17:25
being very imperative.

01:17:31
They had a command line utility

01:17:33
that created clusters and so on.

01:17:35
And now they've turned it into a

01:17:36
declarative, which is more cloud

01:17:39
native, more preferred by the

01:17:41
Kubernetes world.

01:17:44
I think these two are the major

01:17:46
Postgres things that I've seen in

01:17:50
Kubernetes, at least that I've

01:17:54
seen in use the past few years.

01:17:58
There are still things that

01:18:00
haven't been sorted because, as we

01:18:02
said, Postgres is super flexible.

01:18:05
And this flexibility and the ease

01:18:08
of use of Kubernetes, where

01:18:11
everything is taken care of

01:18:12
automatically, comes at a cost.

01:18:14
You have reduced flexibility when

01:18:15
you're on Kubernetes.

01:18:19
So there's things that haven't

01:18:21
been totally worked out yet, like

01:18:23
how do you one click migrate from

01:18:26
a cluster that is outside

01:18:28
Kubernetes to something that is

01:18:30
running in Kubernetes?

01:18:32
Or can you take a backup that was

01:18:34
produced elsewhere and create a

01:18:35
cluster in Kubernetes, a Postgres

01:18:37
cluster from that backup?

01:18:40
Now, once they have these things

01:18:41
sorted and also hardware support

01:18:47
is very important when you're

01:18:50
talking to databases,

01:18:51
I think we'll see many more people

01:18:52
going to Postgres on

01:18:55
Kubernetes in production.

01:18:57
But specifically hardware and

01:19:01
specifically disk performance and

01:19:04
throughput and latency,

01:19:08
you have to get into the hardware

01:19:09
nitty gritty of Kubernetes to take

01:19:13
maximum advantage of Postgres

01:19:14
because as a

01:19:16
database, it loves fast disks.

01:19:19
Generally speaking, the faster

01:19:21
your disk, the

01:19:22
faster Postgres will go.

01:19:25
That is true. And just like a

01:19:28
shameless plug, we're

01:19:29
working on something.

01:19:32
But because we're running out of

01:19:34
time already, 20 minutes

01:19:35
is always so super short.

01:19:38
What do you think is going to be the next

01:19:40
thing for Postgres, the database

01:19:42
world, the cloud

01:19:43
world, whatever you like.

01:19:45
What do you think

01:19:45
is the next thing?

01:19:47
I can't give you an answer, but

01:19:48
you can go search on YouTube and

01:19:53
you can find the lovely Simon's

01:19:56
last contribution to

01:19:58
the Postgres community.

01:20:00
He gave a talk at PostgreSQL

01:20:02
Conference Europe last December

01:20:04
where he said "Postgres, the next 20

01:20:08
years" or something to that effect.

01:20:10
And he predicted things, how

01:20:13
things will go for Postgres in the

01:20:15
future and future directions.

01:20:17
That's a very interesting talk for

01:20:19
anyone who wants to watch that.

01:20:21
I wouldn't want to hazard a guess

01:20:23
because I've seen people just

01:20:27
blindly accept the thing that AI

01:20:30
is the next big thing.

01:20:32
And everything in Postgres and

01:20:33
databases and Java and Python is

01:20:36
going to revolve

01:20:36
around AI in the future.

01:20:39
That remains to be seen.

01:20:41
I like that because normally I

01:20:43
start to say, please don't say AI.

01:20:46
Everyone says that.

01:20:48
And I think AI will be a big part

01:20:52
of the future, but

01:20:54
I agree with you.

01:20:55
It remains to be seen how exactly.

01:20:59
Yeah, thank you very much.

01:21:00
We're going to put the video link

01:21:02
in the show notes as well for

01:21:04
everyone interested.

01:21:06
And yeah, Jimmy,

01:21:09
thank you very much.

01:21:10
It was a pleasure having you.

01:21:12
Thanks very much. I

01:21:13
appreciate the invitation.

01:21:15
My pleasure.

01:21:16
And for the audience, we're going

01:21:19
to see or hear us next week.

01:21:22
And thank you very

01:21:23
much for being here.

01:21:25
The cloud commute podcast is sponsored by

01:21:27
simplyblock your own elastic

01:21:28
block storage engine for the cloud.

01:21:30
Get higher IOPS and low predictable

01:21:32
latency while bringing down your

01:21:33
total cost of ownership.

01:21:35
www.simplyblock.io