A Time-Series Database in Postgres - WHY?
Cloud CommuteAugust 09, 2024x
24
00:29:2226.89 MB

A Time-Series Database in Postgres - WHY?

In this episode of Simplyblock's Cloud Commute Podcast, host Chris Engelbert talks with Mike Freedman, co-founder and CTO of Timescale. Timescale enhances Postgres for handling time series data, analytics, and AI efficiently. The discussion covers Timescale's use of Kubernetes for scalable, decoupled compute and storage, ensuring high availability and efficient resource management, while tackling the challenges of managing stateful databases in Kubernetes.

In this episode of Cloud Commute, Chris and Mike discuss:

  • Building TimescaleDB on top of Postgres for time series and analytics
  • Challenges and benefits of partitioning, data lifecycle, and compression in databases
  • Scaling databases in Kubernetes and overcoming stateful set limitations
  • The future of AI integration and Postgres in application development

Interested to learn more about the cloud infrastructure stack like storage, security, and Kubernetes? Head to our website (www.simplyblock.io/cloud-commute-podcast) for more episodes, and follow us on LinkedIn (www.linkedin.com/company/simplyblock-io). You can also check out the detailed show notes on Youtube (www.youtube.com/watch?v=GrS8LmPVolE).

You can find Mike Freedman on X @michaelfreedman and Linkedin: /mfreed.

About simplyblock:

Simplyblock is an intelligent database storage orchestrator for IO-intensive workloads in Kubernetes, including databases and analytics solutions. It uses smart NVMe caching to speed up read I/O latency and queries. A single system connects local NVMe disks, GP3 volumes, and S3 making it easier to handle storage capacity and performance. With the benefits of thin provisioning, storage tiering, and volume pooling, your database workloads get better performance at lower cost without changes to existing AWS infrastructure.

👉 Get started with simplyblock: https://www.simplyblock.io/buy-now


00:00:00
Postgres works great up

00:00:01
to a certain scale and

00:00:02
then it stops working.

00:00:03
And so what you find

00:00:04
is that users have the

00:00:05
option either to like

00:00:07
try to get like a very

00:00:08
niche database, move

00:00:10
off Postgres, go to

00:00:11
something they're less

00:00:12
familiar with, their

00:00:13
team's less familiar

00:00:14
with, not as good

00:00:14
an ecosystem for it.

00:00:16
Or they could just,

00:00:17
you know, adopt

00:00:18
timescale and basically

00:00:19
keep everything they

00:00:20
already know and love

00:00:20
and then get these

00:00:21
kind of superpowers

00:00:22
on top of it.

00:00:23
When you have

00:00:23
compression, you do

00:00:24
not calculate the

00:00:25
uncompressed storage,

00:00:26
if I understood that

00:00:27
correctly, but you're

00:00:28
calculating the

00:00:30
actual storage uses.

00:00:31
So if I have like a

00:00:32
90 percent compression

00:00:33
ratio, that means

00:00:34
I'm paying like for

00:00:35
10 percent of the

00:00:36
actual data stored.

00:00:37
More broadly, I think we

00:00:39
are bullish on Postgres.

00:00:41
And what that means

00:00:42
is that, you know, we

00:00:43
think that 99 percent

00:00:45
of developers problems

00:00:47
can be solved with

00:00:48
Postgres for database.

00:00:49
Now, there's going

00:00:50
to be always that 1%.

00:00:51
There's going to be

00:00:52
the, Hey, I'm building

00:00:53
Netflix, Uber, Google.

00:00:55
They're going to build

00:00:56
custom, but that's not

00:00:57
what most companies do

00:00:59
you're listening to

00:00:59
Simplyblock's Cloud

00:01:00
Commute Podcast,

00:01:01
your weekly 20

00:01:02
minute podcast about

00:01:03
cloud technologies,

00:01:04
Kubernetes, security,

00:01:06
sustainability,

00:01:07
and more.

00:01:09
Hello everyone.

00:01:10
Welcome back to the next

00:01:11
episode of Simplyblock's

00:01:12
Cloud Commute Podcast.

00:01:14
This week, I have a

00:01:15
very special guest.

00:01:16
I know I say, oh, always

00:01:17
say I have a special

00:01:18
guest, but this week

00:01:19
it's actually a special

00:01:20
guest because he's-

00:01:21
he's basically

00:01:22
my old boss.

00:01:22
I used to work for him.

00:01:24
we know each other for,

00:01:25
a long, long, long time.

00:01:27
and nobody knows

00:01:28
why that's the best

00:01:29
thing about it.

00:01:31
But maybe, Mike had some

00:01:33
idea by now how to get

00:01:36
here, . So, Hey Mike.

00:01:39
Welcome to the show.

00:01:40
glad to have you.

00:01:42
maybe you can give

00:01:43
a quick introduction

00:01:44
of yourself.

00:01:46
Sure.

00:01:46
Thanks for

00:01:46
having me, Chris.

00:01:48
Mike Friedman, I'm

00:01:48
the co founder and

00:01:49
CTO here at Timescale.

00:01:52
Timescale is a database

00:01:54
company we build

00:01:55
on top of Postgres

00:01:56
to make it powerful

00:01:57
for time series,

00:01:58
advanced analytics,

00:01:59
AI, and other things.

00:02:01
I've been working in

00:02:04
distributed systems

00:02:05
and storage systems

00:02:06
for many years.

00:02:08
Not only as a founder

00:02:11
and engineer, but

00:02:12
I'm also a academic.

00:02:13
I'm a professor of

00:02:14
computer science at

00:02:15
Princeton, where I've

00:02:16
been on the faculty for

00:02:18
the last 17 years now.

00:02:21
So, time flies

00:02:22
a little bit.

00:02:23
Oh, wow.

00:02:24
17.

00:02:25
I knew you're Princeton,

00:02:26
but I didn't know

00:02:27
you were a professor

00:02:29
for that long.

00:02:30
Wow.

00:02:31
All right.

00:02:33
I have a bunch of stuff

00:02:34
with elbow pads, and,

00:02:37
graybeard, you know.

00:02:38
That's perhaps

00:02:39
my other persona.

00:02:42
Okay.

00:02:42
I see.

00:02:43
I see.

00:02:43
All right.

00:02:44
You already

00:02:44
mentioned timescale.

00:02:46
You said it's

00:02:46
a database, on

00:02:47
top of Postgres.

00:02:49
Maybe you can

00:02:49
elaborate a little

00:02:50
bit more on that.

00:02:51
A lot of people

00:02:52
listening in are

00:02:53
actually timescale-

00:02:54
Postgres users.

00:02:55
Probably

00:02:56
timescale as well.

00:02:57
Sure.

00:02:58
So Postgres,

00:02:59
you know, old.

00:03:01
Has been a database

00:03:03
for, you know, now

00:03:05
since the nineties

00:03:06
been very popular, but

00:03:07
really I think found a

00:03:08
renaissance in the last

00:03:09
five years is according

00:03:11
to like StackOverflow

00:03:12
is now the most popular

00:03:13
database in the market.

00:03:15
I think when people

00:03:16
today look for

00:03:17
something, that's

00:03:17
what they turn to.

00:03:19
About 15 years ago,

00:03:23
Postgres started

00:03:23
introducing an

00:03:25
extension framework.

00:03:26
And so this is what

00:03:28
allows people to think

00:03:29
of Postgres not only as

00:03:31
the features it offers

00:03:32
itself, the traditional

00:03:34
OLTP database, you know,

00:03:36
back your website, e

00:03:37
commerce site, whatever,

00:03:39
but also a way that

00:03:40
it could be extended

00:03:41
by third parties who

00:03:43
actually have hooks

00:03:44
throughout the database

00:03:45
to make changes.

00:03:47
In the beginning,

00:03:48
that was, you know,

00:03:48
add an index, add

00:03:50
more monitoring,

00:03:51
more smaller things.

00:03:55
But you know, but

00:03:55
timescale really,

00:03:58
I think is at the

00:03:58
forefront of advancing

00:04:01
what you could do

00:04:02
on top of this data

00:04:03
platform other than

00:04:04
what is offered by the

00:04:05
core infrastructure.

00:04:07
And so we basically

00:04:09
built it to be also an

00:04:10
analytics powerhouse to

00:04:12
deal with time series

00:04:14
data, event data,

00:04:16
anything where rather

00:04:18
than very small, you

00:04:19
know, like update,

00:04:20
deletes records, you

00:04:22
want to ingest as

00:04:23
large amounts of data

00:04:25
and then ask questions

00:04:26
about your particularly-

00:04:29
almost how your

00:04:30
data changes over

00:04:31
time, aggregations

00:04:32
on top of it.

00:04:34
Basically insights

00:04:35
to drive things like

00:04:36
real time dashboards,

00:04:38
API, Power Customer

00:04:40
Facing APIs, and other

00:04:41
things like that.

00:04:43
Right.

00:04:44
You already mentioned

00:04:45
like analytics data,

00:04:46
time series, maybe

00:04:47
you can explain

00:04:49
a little bit.

00:04:49
I mean, I know what

00:04:50
it is, but there's

00:04:51
probably still a lot

00:04:51
of people that have

00:04:52
not really an idea what

00:04:54
a time series is and

00:04:55
how to identify one.

00:04:56
Sure.

00:04:57
So we think of time

00:04:58
series as almost

00:04:59
any data that has

00:05:00
a timestamp where

00:05:01
you might actually

00:05:02
care about how

00:05:04
something over time.

00:05:05
Yeah.

00:05:06
And so that could be on

00:05:08
real wall clock time.

00:05:09
How did something

00:05:10
differ day to day?

00:05:11
It could also be in

00:05:12
terms of some abstract

00:05:14
notion of system time.

00:05:15
For example, we have a

00:05:16
lot of crypto companies

00:05:17
that use us for

00:05:18
blockchain analytics,

00:05:20
where the time that

00:05:21
they think about is

00:05:22
actually the blockchain

00:05:24
height, which is a

00:05:25
monotonic counter, as

00:05:28
opposed to something

00:05:29
that corresponds

00:05:30
to wall clock time.

00:05:31
But the nature of

00:05:32
data is that it's

00:05:33
append mostly.

00:05:35
You are often interested

00:05:37
in what happened

00:05:38
recently, not equally,

00:05:40
not randomly what

00:05:41
happened five years

00:05:42
ago, but you also

00:05:44
want to often analyze

00:05:45
the stuff over it.

00:05:46
So rather than simple

00:05:48
point queries, which you

00:05:49
currently can certainly

00:05:50
can do with timescale,

00:05:52
you're often interested

00:05:53
in things that look more

00:05:55
scan like, where you

00:05:56
want to scan over either

00:05:58
some period of time, or

00:06:00
by other thing like, a

00:06:03
device, in a game, in

00:06:06
a gaming application,

00:06:07
a particular user, a

00:06:09
session ID, anything

00:06:11
that actually has that

00:06:11
type of access as well.

00:06:14
Right.

00:06:15
And from a use case

00:06:16
perspective, that is

00:06:17
often observability

00:06:18
data in like the

00:06:20
infrastructure world.

00:06:22
It is any kind of,

00:06:23
you hinted at that,

00:06:24
like IOT data, like

00:06:25
device information,

00:06:27
temperature,

00:06:27
humidity, whatever.

00:06:30
But it can basically

00:06:33
be everything, anything

00:06:35
that has like a time

00:06:36
relation, I think.

00:06:38
Yeah.

00:06:38
And that's how I

00:06:39
would explain it.

00:06:40
We like to almost say

00:06:42
all data is time series,

00:06:44
because as you start

00:06:45
squinting at it, I

00:06:46
mean, I mentioned, you

00:06:48
know, the traditional

00:06:48
OLTP, which is your

00:06:50
e commerce app.

00:06:51
Well, if you actually

00:06:52
look at that database,

00:06:53
where over time, if your

00:06:55
business is successful,

00:06:56
where is most of your

00:06:57
data going to be stored?

00:06:58
It's something like

00:06:59
your orders table, which

00:07:00
is keeping a record

00:07:01
of not just every SKU,

00:07:04
not just every item you

00:07:05
have, but how frequently

00:07:06
you sold that.

00:07:07
Well, what is

00:07:07
an orders table?

00:07:08
An orders table is

00:07:09
basically a log of

00:07:10
events with a timestamp.

00:07:12
And you might look

00:07:12
over the time and see

00:07:13
how your sales has

00:07:14
changed across a time,

00:07:17
across a particular

00:07:18
order, across a region.

00:07:19
This is all what

00:07:21
we think of as time

00:07:22
series or, event data.

00:07:24
And so, yeah, so you

00:07:25
might think of it as

00:07:26
there are certainly

00:07:27
been use cases in

00:07:28
observability, but in

00:07:29
IOT, in manufacturing,

00:07:31
in energy, in gaming,

00:07:33
in product analytics,

00:07:34
in music analytics,

00:07:36
anything where you ever

00:07:37
would build a dashboard

00:07:38
on top of your data,

00:07:39
I think is a good,

00:07:41
or analysis over your

00:07:43
data, I think is a

00:07:43
good use for something

00:07:44
like timescale.

00:07:46
Yeah, I agree.

00:07:47
And I like the, like

00:07:49
order invoice table

00:07:51
kind of stuff, because

00:07:52
that is something

00:07:53
people really do not

00:07:55
consider as time series.

00:07:56
But as you said,

00:07:57
when you want to

00:07:57
build a dashboard

00:07:58
on top of that, it

00:07:59
makes total sense.

00:08:00
You want to understand

00:08:01
like, what is the

00:08:02
average order value?

00:08:04
What is like the

00:08:05
lifetime value

00:08:05
of a customer?

00:08:07
The average lifetime

00:08:07
value of a customer.

00:08:08
All that kind of stuff.

00:08:10
And hopefully your

00:08:11
business is successful

00:08:12
enough that this table

00:08:13
grows quite a bit.

00:08:16
If not, you're probably

00:08:17
in the wrong business.

00:08:19
Right.

00:08:19
And if you even think

00:08:20
of the questions you

00:08:21
ask, you know, in your,

00:08:23
if you have a website

00:08:25
or a web console that

00:08:27
you have your users

00:08:28
log into, typically

00:08:29
you'll show them their

00:08:30
orders in chronologic

00:08:31
order, chronologic

00:08:32
reverse order, right?

00:08:33
Which is again, the

00:08:34
recent stuff gets

00:08:35
shown more frequently.

00:08:37
Or you might allow

00:08:38
them to define, you

00:08:39
know, tell me all

00:08:40
the orders I placed

00:08:41
during this month.

00:08:42
Again, it's, have your

00:08:45
history of all data,

00:08:46
but it narrows into a

00:08:47
particular user or a

00:08:48
particular time range.

00:08:50
That is the type of

00:08:50
things where the type

00:08:53
of capabilities we build

00:08:54
the infrastructure in

00:08:56
timescale maps very well

00:08:57
to efficient, scalable,

00:09:00
performing queries.

00:09:02
So, the next question

00:09:03
would be like, why do

00:09:05
this in an extension?

00:09:06
I mean, Postgres

00:09:07
gives me basically

00:09:10
everything I

00:09:11
need to do that.

00:09:12
And in the worst case

00:09:14
I can partition data,

00:09:15
but why would I do

00:09:16
that with an extension?

00:09:19
Yeah.

00:09:19
So the way I like

00:09:22
to think of is what

00:09:23
timescale has often

00:09:24
done is has made

00:09:25
Postgres is performance

00:09:27
scale for, you know,

00:09:29
particular app for a

00:09:30
particular use cases.

00:09:31
And it is true that

00:09:33
like, obviously SQL

00:09:34
is super powerful

00:09:35
and Postgres is

00:09:35
super powerful.

00:09:36
You could represent this

00:09:38
problem in Postgres.

00:09:39
In fact, the biggest

00:09:41
source of users of

00:09:42
timescale is people

00:09:43
who started on Postgres

00:09:44
and realized that

00:09:45
they need something

00:09:46
else to take it to

00:09:47
the next level, either

00:09:48
because of scalability,

00:09:50
performance, cost

00:09:51
effectiveness or, you

00:09:53
know, we enable a lot

00:09:55
ease of use because of

00:09:56
kind of the additional

00:09:57
functionality built

00:09:58
on top of that.

00:09:59
You know, in general,

00:10:01
why do we ever have many

00:10:03
databases is because

00:10:04
what you can do under

00:10:05
the covers is you

00:10:06
build data structures,

00:10:08
optimizations,

00:10:11
other things that

00:10:12
basically make it

00:10:13
both easier and more

00:10:15
performant to manage.

00:10:16
You know, I could

00:10:17
talk about, you know,

00:10:19
various technological

00:10:20
things that we built,

00:10:22
but, you know, that's

00:10:24
basically the reason,

00:10:24
you know, it, Postgres

00:10:26
works great up to

00:10:27
a certain scale and

00:10:28
then it stops working.

00:10:29
And so what you find

00:10:31
is that users have the

00:10:33
option either to like,

00:10:35
try to get like a very

00:10:36
niche database, move

00:10:38
off Postgres, go to

00:10:39
something they're less

00:10:40
familiar with, their

00:10:41
team's less familiar

00:10:42
with, there's not as

00:10:42
good enough ecosystem

00:10:43
for it, or they could

00:10:45
just, you know, adopt

00:10:46
timescale and basically

00:10:47
keep everything they

00:10:49
already know and love

00:10:49
and then get these

00:10:50
kind of superpowers

00:10:51
on top of it.

00:10:53
Right.

00:10:53
So you mentioned

00:10:54
superpowers.

00:10:55
Maybe, maybe just

00:10:56
give one or two

00:10:57
examples, because

00:10:59
I know there's like

00:11:00
super cool features.

00:11:03
So stop talking black

00:11:04
magic and give the

00:11:07
engineers something to

00:11:08
sink their teeth into.

00:11:10
So one of the really

00:11:11
interesting things

00:11:11
is that Timescale

00:11:14
has a very, what

00:11:16
I'd call interesting

00:11:17
data life cycle.

00:11:18
And let me get into it.

00:11:21
You mentioned briefly

00:11:22
before partitioning,

00:11:24
you know, timescale

00:11:25
has these things, which

00:11:27
we call hyper tables

00:11:28
rather than tables.

00:11:29
And one of the

00:11:30
big differences is

00:11:31
they basically do

00:11:33
automated partitioning

00:11:34
under the covers.

00:11:34
You kind of

00:11:35
create a policy.

00:11:36
You might say, what

00:11:38
roughly is a time period

00:11:39
of which you want to

00:11:40
partition the data?

00:11:41
It could be a week,

00:11:42
it could be a day.

00:11:45
We have high, you

00:11:46
know, really high

00:11:47
end services that do

00:11:48
it every 10 minutes.

00:11:50
And you just set

00:11:50
it and forget it.

00:11:51
And it could be modified

00:11:52
over time and adapt as

00:11:53
your data volume scale.

00:11:56
But under the covers,

00:11:58
what we're doing and

00:11:59
a bunch of things,

00:11:59
but one thing I want

00:12:00
to talk about with

00:12:01
data lifecycle is

00:12:02
we basically kind of

00:12:04
change the formatting

00:12:05
of your data as it

00:12:06
ages because your needs

00:12:08
will often change.

00:12:09
So for the most recent

00:12:10
data, we store that

00:12:11
in row based form,

00:12:13
and that becomes much

00:12:14
more efficient than to

00:12:15
do high ingest rates,

00:12:16
'cause usually

00:12:17
data comes in as an

00:12:18
individual row and

00:12:19
you can kind of insert

00:12:20
it row after row,

00:12:21
after row after row.

00:12:22
And then as it gets

00:12:23
a little bit older,

00:12:25
we basically have,

00:12:28
the engine underneath

00:12:29
the covers would

00:12:30
automatically change

00:12:31
it to a compressed

00:12:33
columnar format.

00:12:34
And so, compression is

00:12:36
great for cost savings.

00:12:38
In production, we see on

00:12:40
average something like

00:12:41
90 to 95 percent storage

00:12:44
reduction once people

00:12:45
adopt our columnar.

00:12:46
We actually use

00:12:47
different algorithms

00:12:49
per column based on the

00:12:50
data type automatically.

00:12:52
And then columnar

00:12:53
means that I talked

00:12:54
about scans before,

00:12:55
it becomes much more

00:12:56
efficient then to do

00:12:59
kind of longer queries

00:13:01
over time, particularly

00:13:03
when you're interested

00:13:05
in like primary keys

00:13:06
where you want to

00:13:07
scan over the user or

00:13:08
when you want to ask

00:13:09
questions about certain

00:13:10
columns as opposed

00:13:11
to all the columns,

00:13:12
let's say, that might

00:13:12
be in the database.

00:13:14
And then even third on

00:13:15
our cloud, we offered

00:13:16
a tiered storage

00:13:17
now, which is when

00:13:18
we actually store the

00:13:19
data from, you know,

00:13:21
it starts in hot, high

00:13:22
performance row format.

00:13:24
To high performance

00:13:25
columnar format, and now

00:13:27
to tiered, warm storage.

00:13:29
So this is kind of

00:13:30
tiering it off to S3.

00:13:32
So it becomes this

00:13:33
bottomless store for

00:13:35
data that, again, is

00:13:36
transparently queried,

00:13:38
but now provides

00:13:40
this trade off for

00:13:40
you as a developer.

00:13:42
You know, your trade off

00:13:42
between the performance

00:13:44
you get from it versus

00:13:45
the cost of, you

00:13:46
know, how expensive

00:13:47
the storage is.

00:13:48
And again, you

00:13:49
create a hyper table.

00:13:50
You set two policies for

00:13:52
when you want to move

00:13:54
something to columnar

00:13:54
and when you want to

00:13:55
move something to tiered

00:13:56
storage and everything

00:13:57
else is handled for you.

00:13:58
So there's an example

00:13:59
of like, obviously

00:14:00
Postgres itself doesn't

00:14:01
have columnar storage,

00:14:03
doesn't do this tiering.

00:14:05
There's a lot of query

00:14:06
optimizations on top

00:14:07
of it, which we use to

00:14:09
select which portion

00:14:10
of your data we query.

00:14:12
We build a lot

00:14:13
of small indexes.

00:14:14
We build sparse indexes.

00:14:16
There's a lot of

00:14:16
this fancy stuff

00:14:17
under the covers.

00:14:18
Again, as a developer,

00:14:19
you don't really think

00:14:19
about that, kind of,

00:14:21
we've done that for you.

00:14:23
Right, right.

00:14:24
And as a developer,

00:14:27
I feel like I want to

00:14:28
get started right now.

00:14:29
So what is-

00:14:31
how would I

00:14:31
go about that?

00:14:33
Yeah, so there's

00:14:33
two ways.

00:14:33
One is Timescale

00:14:35
is open source.

00:14:36
You could go to GitHub,

00:14:37
you could download,

00:14:37
we provide, you know,

00:14:38
different installations

00:14:39
that you could use.

00:14:40
We also, in our

00:14:41
businesses, we build a

00:14:43
Timescale cloud, which

00:14:43
is a managed service.

00:14:45
And so, you know,

00:14:45
you log in, go to

00:14:46
timescale.com, click

00:14:48
one button and you have

00:14:48
a database running,

00:14:50
you know, up and

00:14:50
starting in two minutes.

00:14:53
That takes, you know, I

00:14:55
think I would describe

00:14:55
our cloud in two ways.

00:14:56
One is, it obviously

00:14:58
takes care of all the

00:14:59
operational complexity

00:15:00
you normally have from

00:15:02
operating Postgres,

00:15:03
HA, replicas, backup

00:15:05
restore, port and time

00:15:07
recovery, upgrades,

00:15:09
observability,

00:15:09
monitoring, 24/7,

00:15:11
all that stuff.

00:15:13
But also it has, it is

00:15:16
intentional that we know

00:15:18
we're building these

00:15:18
type of applications.

00:15:20
So not only there

00:15:21
are operational

00:15:21
improvements.

00:15:22
But there are also

00:15:23
those like application

00:15:24
level improvements.

00:15:25
So these things

00:15:26
like tiered storage,

00:15:27
which you might have.

00:15:28
You don't find this

00:15:29
in something like,

00:15:30
you know, RDS or

00:15:31
something else.

00:15:32
This is really built

00:15:33
because we understand

00:15:34
our problem domain

00:15:35
and build a cloud

00:15:36
product for it.

00:15:38
Right.

00:15:39
Right.

00:15:39
And you said you

00:15:40
can run it yourself.

00:15:43
It's Apache licensed.

00:15:45
Is there anything I need

00:15:46
to be careful about?

00:15:47
And like, how do

00:15:50
I deploy that?

00:15:51
Do I deploy it

00:15:51
into Kubernetes?

00:15:52
Does it run on

00:15:53
Docker, like virtual

00:15:55
machine or whatever?

00:15:57
All of the above.

00:15:58
You know, you could, I

00:16:01
mean, software you can

00:16:02
deploy in many places.

00:16:05
We provide

00:16:06
Docker images.

00:16:06
We provide some RPMs

00:16:08
that you could install

00:16:09
into your thing.

00:16:11
We ourselves use

00:16:13
Kubernetes internally,

00:16:14
and we know a lot

00:16:15
of people, deploy.

00:16:18
And we have a Helm

00:16:20
chart, but a lot

00:16:20
of people deploy

00:16:22
it with existing

00:16:23
Kubernetes operators.

00:16:25
Kind of community

00:16:26
ones built for

00:16:26
Postgres, they work

00:16:27
for Timescale as well.

00:16:29
Right.

00:16:30
You already hinted

00:16:31
you're using

00:16:32
Kubernetes internally.

00:16:34
So let's switch to

00:16:35
that because we're a

00:16:36
Kubernetes podcast.

00:16:37
And I love podcasts.

00:16:39
So we can share about

00:16:41
the infrastructure

00:16:41
without I have to

00:16:43
sign an NDA for the

00:16:43
whole community.

00:16:45
Sure, sure.

00:16:47
So, you know,

00:16:47
we're heavy users

00:16:48
of Kubernetes.

00:16:49
One of the things this

00:16:50
allows us to do is, you

00:16:52
know, operate at scale,

00:16:55
where we have decoupled

00:16:57
the problem of spinning

00:16:59
up AWS instances, which,

00:17:02
you know, often or other

00:17:03
hyperscaler instances,

00:17:06
which are often on the

00:17:06
order of minutes from

00:17:09
individual placements of

00:17:12
containers and whatnot.

00:17:14
You know, it's been-

00:17:16
We've been using

00:17:18
Kubernetes from

00:17:19
the beginning.

00:17:19
So I guess now since our

00:17:20
cloud has been almost

00:17:22
five years since we

00:17:23
started building it,

00:17:26
have obviously gone

00:17:27
through the joys and the

00:17:32
trials and tribulations

00:17:33
of deploying

00:17:34
Kubernetes at scale.

00:17:36
One of the big things

00:17:37
about timescale cloud is

00:17:40
that we have decoupled

00:17:41
compute and storage.

00:17:43
So, customers, users

00:17:45
can independently size

00:17:47
their database and

00:17:49
kind of at any time,

00:17:50
change the sizing.

00:17:51
So if you all of

00:17:51
a sudden want to

00:17:53
move from two CPUs

00:17:54
to 16 CPUs, it's a

00:17:56
click of the button.

00:17:57
And that can be done

00:17:58
with, in some cases

00:18:00
in HA, no downtime.

00:18:01
If you have non

00:18:02
HA with, you know,

00:18:03
typically something

00:18:03
like 30 seconds of just

00:18:06
replacing container,

00:18:08
because it's decoupled

00:18:08
from your storage.

00:18:10
And then we manage

00:18:11
storage as a completely

00:18:13
separate tier that

00:18:14
grows independently.

00:18:15
One of the other

00:18:16
interesting things

00:18:17
about, again, enabled

00:18:19
by Kubernetes, but

00:18:20
also a lot of things

00:18:21
we do ourselves is

00:18:22
we, you know, normally

00:18:24
when we think of

00:18:24
databases, you think

00:18:25
of, hard provisioning

00:18:26
your storage.

00:18:27
And you need to

00:18:28
provision a disk.

00:18:29
I want to provision

00:18:30
a hundred gigabytes

00:18:31
of storage or a

00:18:33
terabyte of storage

00:18:33
as the case may be.

00:18:35
On our cloud, we

00:18:37
hide all of that

00:18:38
complexity from users

00:18:39
and users purely have

00:18:41
usage based storage.

00:18:43
So what that means is

00:18:43
they never think about

00:18:44
allocating storage.

00:18:46
They just start

00:18:46
storing data with it.

00:18:48
It scales as they need.

00:18:50
We manage all that in

00:18:52
the backend and they pay

00:18:53
for only what they use.

00:18:54
So if they're using

00:18:56
496 gigabytes, that's

00:18:58
how much they pay for.

00:18:59
And if they turn

00:19:00
on something like

00:19:00
compression and

00:19:01
it drops down by

00:19:02
half their storage

00:19:03
consumption, which

00:19:04
is what often happens

00:19:05
with our customers,

00:19:05
they just pay for half.

00:19:07
And so they kind of,

00:19:09
we've made it easier

00:19:10
and have allowed

00:19:10
our users not to

00:19:11
really think about

00:19:13
managing those things.

00:19:15
I think you mentioned an

00:19:16
important thing and it

00:19:17
was basically one of the

00:19:18
questions I would have

00:19:20
asked anyways, like when

00:19:22
you have compression,

00:19:24
you do not calculate

00:19:25
the uncompressed

00:19:26
storage, if I understood

00:19:28
that correctly.

00:19:28
But, you're calculating

00:19:31
the actual storage uses.

00:19:32
So if I have like a

00:19:34
90 percent compression

00:19:35
ratio, that means

00:19:35
I'm paying like for

00:19:37
10 percent of the

00:19:38
actual data stored.

00:19:40
Yeah, this is one big

00:19:41
way, where in many

00:19:43
cases, timescale turns

00:19:45
out to be cheaper.

00:19:46
Not only more scalable

00:19:47
and performant, but can

00:19:48
be cheaper than even

00:19:51
something like using

00:19:52
RDS Aurora is because,

00:19:55
you know, we'll charge

00:19:56
you for your storage.

00:19:58
And so maybe on a per

00:20:00
gigabyte of storage,

00:20:01
we are charging more

00:20:03
than RDS Aurora.

00:20:05
But if you're only using

00:20:06
one tenth of the size.

00:20:08
You know, it obviously

00:20:09
translates to savings.

00:20:11
Not only again,

00:20:12
operational and

00:20:14
operational, some

00:20:16
improvements, you know.

00:20:17
We regularly operate

00:20:20
with customer databases

00:20:21
who are storing many

00:20:22
terabytes of data and

00:20:25
Postgres, you know, has-

00:20:28
starts getting

00:20:29
challenges at

00:20:30
that scale.

00:20:31
Not, you know,

00:20:33
storage on disk and

00:20:34
querying it, but the

00:20:35
operational size.

00:20:37
Try to take a backup

00:20:38
and then restore from

00:20:40
a, you know, try to

00:20:41
use pgDumpRestore

00:20:42
on a 10 terabyte

00:20:45
disk and, you know,

00:20:46
it's not very happy.

00:20:50
Even if you use it in

00:20:51
a binary mode, it will

00:20:53
yeah, as you said, it's

00:20:54
not going to be happy.

00:20:56
So you said you're

00:20:58
running on Kubernetes

00:20:59
and you're hardcore

00:21:01
Kubernetes users.

00:21:03
So, we all know

00:21:05
running a database in

00:21:06
Kubernetes isn't as

00:21:08
easy as it could be.

00:21:09
So what do you think

00:21:10
is like the biggest

00:21:11
issue or the biggest

00:21:13
like problem right

00:21:15
now, running it at

00:21:16
scale, making sure

00:21:18
it's performance is

00:21:20
on point and you think

00:21:22
like stuff like noisy

00:21:23
neighbor and, whatever

00:21:24
you can come up with.

00:21:25
Well, what do you've

00:21:26
seen in real world?

00:21:30
Yeah, I think there's a

00:21:32
couple things going on.

00:21:34
One is we,

00:21:41
you know, we, the,

00:21:42
some of the OS level

00:21:48
performance isolation

00:21:49
between instances

00:21:50
we actually think

00:21:51
works quite well.

00:21:53
And we found

00:21:55
that at least at

00:21:57
that level, the-

00:21:58
on the compute side,

00:22:00
we have not run

00:22:02
into, you know,

00:22:05
it has actually

00:22:06
worked out quite well.

00:22:07
One of the big things

00:22:08
that we don't have is

00:22:11
we actually have storage

00:22:12
isolation between users.

00:22:14
So we've actually

00:22:15
seen most of the, when

00:22:18
people run into like

00:22:19
noisy, when people

00:22:21
have concerns about

00:22:22
noisy neighbors, it

00:22:23
often relates to, well,

00:22:26
one of two things.

00:22:27
One is, the fact that

00:22:31
some architectures

00:22:32
have shared storage

00:22:33
backends that doesn't

00:22:34
have actually good

00:22:35
performance isolation

00:22:36
between customers.

00:22:38
And they don't

00:22:40
have the ability to

00:22:43
basically manage IOPS

00:22:46
and bandwidth on a

00:22:47
per customer basis.

00:22:48
And that is something

00:22:49
that we actually do

00:22:49
in the cloud, that we

00:22:53
also have the ability

00:22:54
to not only do we have

00:22:55
of isolated storage

00:22:58
per user storage

00:22:59
capacity, but we

00:23:01
actually can provision

00:23:02
IOPS and bandwidth on

00:23:03
a per tenant basis.

00:23:05
And what we naturally

00:23:07
do is we actually scale

00:23:09
IOPS and bandwidth with

00:23:10
your storage capacity.

00:23:12
So when you sign up

00:23:13
without doing anything

00:23:14
else, we start you on

00:23:15
a lower level and then

00:23:16
we scale it as your-

00:23:19
as your storage

00:23:20
scales itself.

00:23:22
But we also have the

00:23:23
ability to boost it.

00:23:24
So we have this notion

00:23:25
of which kind of we

00:23:27
internally called IO

00:23:28
boost, where we have

00:23:28
the ability to kind of

00:23:29
max out on a certain

00:23:32
customer basis, kind

00:23:33
of IOPS and bandwidth.

00:23:35
Obviously it costs

00:23:35
some extra to us.

00:23:36
We, you know, we pass

00:23:38
on some of that cost

00:23:39
to our customers.

00:23:40
But that actually

00:23:41
allows a user to kind

00:23:43
of start nicely and

00:23:44
then scale as needed.

00:23:46
You know, I think the

00:23:47
other thing I would

00:23:48
generally say with

00:23:50
Kubernetes is, you're

00:23:52
right in that it you

00:23:56
know, some of its

00:23:56
early promises are

00:23:57
meant for like these

00:23:58
stateless tiers of

00:24:00
horizontal nodes where

00:24:01
you don't actually

00:24:02
ever have to think

00:24:03
about the instances.

00:24:04
They're all independent.

00:24:07
And we've never really

00:24:09
thought of it that way.

00:24:11
We've, in the beginning,

00:24:12
we've actually had

00:24:13
to heavily invest

00:24:13
in writing our own

00:24:14
operators to manage a

00:24:16
lot of things because

00:24:17
we need to, you

00:24:18
know, we don't manage

00:24:19
thousands or tens of

00:24:20
thousands of database

00:24:22
that look the same.

00:24:23
We manage each one that

00:24:24
looks independently

00:24:25
different and has to

00:24:26
be managed separately.

00:24:28
And the other thing I

00:24:30
would say is there's

00:24:31
a lot of abstractions

00:24:33
in Kubernetes.

00:24:34
Let's take an example

00:24:34
stateful sets where

00:24:37
the thesis is that all

00:24:40
instances of a stateful

00:24:41
set look identical.

00:24:43
And that's actually

00:24:44
something that we've

00:24:45
both had to fight about,

00:24:47
fight with, and then

00:24:48
eventually kind of

00:24:49
conclude that it's not

00:24:50
right for us because

00:24:51
we actually want to

00:24:53
manage, even replicas of

00:24:55
a database separately.

00:24:58
So you could do things

00:24:58
like intelligent

00:25:00
staged upgrades,

00:25:01
staged rollouts,

00:25:01
staged resizing, which

00:25:03
certain abstractions in

00:25:05
stateful, like stateful

00:25:06
sets, which is meant to

00:25:08
make all of these things

00:25:09
identical, actually

00:25:10
aren't well suited for.

00:25:13
Right, right.

00:25:14
I think the last one

00:25:16
is really interesting

00:25:17
because that is

00:25:17
something that a

00:25:18
lot of people will

00:25:19
probably fight in

00:25:22
the near far future.

00:25:24
When more and more

00:25:25
databases move

00:25:26
to Kubernetes.

00:25:28
I didn't think about

00:25:29
it because I always

00:25:30
considered like the

00:25:31
replicas being the

00:25:32
same, but you're right.

00:25:35
If you want to do like

00:25:35
a rolling upgrade or

00:25:36
stuff, it's going

00:25:38
to be a little bit

00:25:39
more challenging.

00:25:40
Anyway, we're pretty

00:25:41
much out of time.

00:25:42
So one last question,

00:25:44
what do you personally

00:25:45
think is like the

00:25:45
next big thing?

00:25:46
What is like upcoming?

00:25:48
What is already

00:25:49
here, but growing?

00:25:52
I know the answer, but-

00:25:55
Well, let me-

00:25:56
Maybe this is the

00:25:57
answer you thought,

00:25:58
but I'm going to give

00:25:59
a back to the future.

00:26:00
So, you know, Timescale

00:26:02
started by building time

00:26:03
series analytics on top

00:26:05
of Postgres, and about

00:26:06
a year ago, we launched

00:26:08
our AI product, our

00:26:09
vector search product.

00:26:10
And kind of open

00:26:11
sourced it under

00:26:11
the Postgres license

00:26:12
just a month ago.

00:26:13
A lot of really

00:26:14
interesting to do

00:26:15
what we call PG vector

00:26:16
scale, which is scalable

00:26:18
vector search, and

00:26:18
then PG AI, which is

00:26:19
a lot of you do AI.

00:26:21
Things like OpenAI,

00:26:23
Cohere, Olam

00:26:24
Embeddings directly

00:26:25
in your database.

00:26:27
More broadly, I think we

00:26:29
are bullish on Postgres.

00:26:32
And what that means

00:26:33
is that, you know, we

00:26:34
think that 99 percent

00:26:36
of developers' problems

00:26:39
can be solved with

00:26:39
Postgres for a database.

00:26:41
Now there's going to

00:26:41
be always that 1%.

00:26:43
There's going to be

00:26:43
the, Hey, I'm building

00:26:45
Netflix, Uber, Google.

00:26:47
They're going to build

00:26:48
custom, but that's not

00:26:49
what most companies do.

00:26:51
And that's both startups

00:26:52
and actually, you

00:26:54
know, we deal with

00:26:55
some companies that

00:26:56
are one month old.

00:26:56
We've deal with

00:26:57
companies that were

00:27:00
built in the industrial,

00:27:01
first industrial

00:27:02
revolution, right?

00:27:04
So we have the large

00:27:05
gamut of companies

00:27:07
that we serve.

00:27:08
And for many of them,

00:27:10
the reliability,

00:27:12
usability, the

00:27:15
ecosystem of Postgres

00:27:16
is so powerful.

00:27:17
You know, they have

00:27:18
people who know that

00:27:18
they have people who

00:27:19
trust that they have,

00:27:20
if they want to use

00:27:21
our managed cloud.

00:27:22
If they want it

00:27:22
themselves, there's

00:27:23
lots of options.

00:27:24
And so, we're very

00:27:26
bullish on that.

00:27:27
And, you know, we

00:27:28
continue to basically

00:27:30
think about how we

00:27:32
then, like I said, make

00:27:33
Postgres powerful for

00:27:34
all these different use

00:27:35
cases where it could

00:27:36
serve these types of

00:27:37
demanding applications.

00:27:38
So kind of expect

00:27:39
to see, you know, we

00:27:40
started by in time

00:27:41
series and analytics.

00:27:43
We built an amazing

00:27:44
cloud platform

00:27:44
for Postgres.

00:27:46
And in doing so, we

00:27:46
built, like I said,

00:27:47
a cloud platform for

00:27:48
Postgres, not just

00:27:50
a cloud platform

00:27:51
for time series.

00:27:52
And so expect

00:27:53
more from that.

00:27:53
All right.

00:27:53
Fair enough.

00:27:55
Fair enough.

00:27:55
That was not the

00:27:56
answer I expected.

00:27:59
Everyone else said AI.

00:28:00
You kind of made

00:28:02
me sad here.

00:28:03
Well, AI, I actually,

00:28:06
I mean, AI,

00:28:08
but maybe I have-

00:28:08
Let me get take two

00:28:10
on that question.

00:28:12
I think five years

00:28:13
from now, we won't be

00:28:14
talking about building

00:28:15
AI applications.

00:28:16
We'll be talking about

00:28:17
building applications

00:28:18
where it will, they

00:28:19
will all incorporate AI

00:28:20
where it makes sense.

00:28:21
And so, you know, I

00:28:23
think we're going to

00:28:24
have a lot of people

00:28:25
talk about being AI

00:28:26
engineers and whatever.

00:28:27
I think this will

00:28:27
happen, but I think this

00:28:29
will actually recede

00:28:30
because this will be

00:28:31
another thing in the

00:28:33
tool belt of developers

00:28:35
where what it means to

00:28:35
be a modern developer.

00:28:38
All right.

00:28:38
I like that.

00:28:39
That's a beautiful

00:28:40
last sentence, I guess.

00:28:42
So, we're out of time.

00:28:44
Thank you, Mike,

00:28:45
for being here.

00:28:46
Thank you for being

00:28:46
an awesome guest.

00:28:47
And for-

00:28:50
And for the

00:28:51
audience, thank

00:28:52
you for being here.

00:28:53
Hope to see you

00:28:54
next week, same

00:28:56
place, same time.

00:28:57
And thank you very much

00:28:58
for being here as well.

00:29:02
The Cloud Commute

00:29:03
Podcast is sponsored by

00:29:04
Simplyblock, your own

00:29:05
elastic block storage

00:29:06
engine for the cloud.

00:29:07
Get higher IOPS and

00:29:09
low predictable latency

00:29:10
while bringing down your

00:29:11
total cost of ownership.

00:29:12
www.simplyblock.Io