In this episode of Simplyblock's Cloud Commute Podcast, host Chris Engelbert talks with Mike Freedman, co-founder and CTO of Timescale. Timescale enhances Postgres for handling time series data, analytics, and AI efficiently. The discussion covers Timescale's use of Kubernetes for scalable, decoupled compute and storage, ensuring high availability and efficient resource management, while tackling the challenges of managing stateful databases in Kubernetes.
In this episode of Cloud Commute, Chris and Mike discuss:
- Building TimescaleDB on top of Postgres for time series and analytics
- Challenges and benefits of partitioning, data lifecycle, and compression in databases
- Scaling databases in Kubernetes and overcoming stateful set limitations
- The future of AI integration and Postgres in application development
Interested to learn more about the cloud infrastructure stack like storage, security, and Kubernetes? Head to our website (www.simplyblock.io/cloud-commute-podcast) for more episodes, and follow us on LinkedIn (www.linkedin.com/company/simplyblock-io). You can also check out the detailed show notes on Youtube (www.youtube.com/watch?v=GrS8LmPVolE).
You can find Mike Freedman on X @michaelfreedman and Linkedin: /mfreed.
About simplyblock:
Simplyblock is an intelligent database storage orchestrator for IO-intensive workloads in Kubernetes, including databases and analytics solutions. It uses smart NVMe caching to speed up read I/O latency and queries. A single system connects local NVMe disks, GP3 volumes, and S3 making it easier to handle storage capacity and performance. With the benefits of thin provisioning, storage tiering, and volume pooling, your database workloads get better performance at lower cost without changes to existing AWS infrastructure.
👉 Get started with simplyblock: https://www.simplyblock.io/buy-now
00:00:00
Postgres works great up
00:00:01
to a certain scale and
00:00:02
then it stops working.
00:00:03
And so what you find
00:00:04
is that users have the
00:00:05
option either to like
00:00:07
try to get like a very
00:00:08
niche database, move
00:00:10
off Postgres, go to
00:00:11
something they're less
00:00:12
familiar with, their
00:00:13
team's less familiar
00:00:14
with, not as good
00:00:14
an ecosystem for it.
00:00:16
Or they could just,
00:00:17
you know, adopt
00:00:18
timescale and basically
00:00:19
keep everything they
00:00:20
already know and love
00:00:20
and then get these
00:00:21
kind of superpowers
00:00:22
on top of it.
00:00:23
When you have
00:00:23
compression, you do
00:00:24
not calculate the
00:00:25
uncompressed storage,
00:00:26
if I understood that
00:00:27
correctly, but you're
00:00:28
calculating the
00:00:30
actual storage uses.
00:00:31
So if I have like a
00:00:32
90 percent compression
00:00:33
ratio, that means
00:00:34
I'm paying like for
00:00:35
10 percent of the
00:00:36
actual data stored.
00:00:37
More broadly, I think we
00:00:39
are bullish on Postgres.
00:00:41
And what that means
00:00:42
is that, you know, we
00:00:43
think that 99 percent
00:00:45
of developers problems
00:00:47
can be solved with
00:00:48
Postgres for database.
00:00:49
Now, there's going
00:00:50
to be always that 1%.
00:00:51
There's going to be
00:00:52
the, Hey, I'm building
00:00:53
Netflix, Uber, Google.
00:00:55
They're going to build
00:00:56
custom, but that's not
00:00:57
what most companies do
00:00:59
you're listening to
00:00:59
Simplyblock's Cloud
00:01:00
Commute Podcast,
00:01:01
your weekly 20
00:01:02
minute podcast about
00:01:03
cloud technologies,
00:01:04
Kubernetes, security,
00:01:06
sustainability,
00:01:07
and more.
00:01:09
Hello everyone.
00:01:10
Welcome back to the next
00:01:11
episode of Simplyblock's
00:01:12
Cloud Commute Podcast.
00:01:14
This week, I have a
00:01:15
very special guest.
00:01:16
I know I say, oh, always
00:01:17
say I have a special
00:01:18
guest, but this week
00:01:19
it's actually a special
00:01:20
guest because he's-
00:01:21
he's basically
00:01:22
my old boss.
00:01:22
I used to work for him.
00:01:24
we know each other for,
00:01:25
a long, long, long time.
00:01:27
and nobody knows
00:01:28
why that's the best
00:01:29
thing about it.
00:01:31
But maybe, Mike had some
00:01:33
idea by now how to get
00:01:36
here, . So, Hey Mike.
00:01:39
Welcome to the show.
00:01:40
glad to have you.
00:01:42
maybe you can give
00:01:43
a quick introduction
00:01:44
of yourself.
00:01:46
Sure.
00:01:46
Thanks for
00:01:46
having me, Chris.
00:01:48
Mike Friedman, I'm
00:01:48
the co founder and
00:01:49
CTO here at Timescale.
00:01:52
Timescale is a database
00:01:54
company we build
00:01:55
on top of Postgres
00:01:56
to make it powerful
00:01:57
for time series,
00:01:58
advanced analytics,
00:01:59
AI, and other things.
00:02:01
I've been working in
00:02:04
distributed systems
00:02:05
and storage systems
00:02:06
for many years.
00:02:08
Not only as a founder
00:02:11
and engineer, but
00:02:12
I'm also a academic.
00:02:13
I'm a professor of
00:02:14
computer science at
00:02:15
Princeton, where I've
00:02:16
been on the faculty for
00:02:18
the last 17 years now.
00:02:21
So, time flies
00:02:22
a little bit.
00:02:23
Oh, wow.
00:02:24
17.
00:02:25
I knew you're Princeton,
00:02:26
but I didn't know
00:02:27
you were a professor
00:02:29
for that long.
00:02:30
Wow.
00:02:31
All right.
00:02:33
I have a bunch of stuff
00:02:34
with elbow pads, and,
00:02:37
graybeard, you know.
00:02:38
That's perhaps
00:02:39
my other persona.
00:02:42
Okay.
00:02:42
I see.
00:02:43
I see.
00:02:43
All right.
00:02:44
You already
00:02:44
mentioned timescale.
00:02:46
You said it's
00:02:46
a database, on
00:02:47
top of Postgres.
00:02:49
Maybe you can
00:02:49
elaborate a little
00:02:50
bit more on that.
00:02:51
A lot of people
00:02:52
listening in are
00:02:53
actually timescale-
00:02:54
Postgres users.
00:02:55
Probably
00:02:56
timescale as well.
00:02:57
Sure.
00:02:58
So Postgres,
00:02:59
you know, old.
00:03:01
Has been a database
00:03:03
for, you know, now
00:03:05
since the nineties
00:03:06
been very popular, but
00:03:07
really I think found a
00:03:08
renaissance in the last
00:03:09
five years is according
00:03:11
to like StackOverflow
00:03:12
is now the most popular
00:03:13
database in the market.
00:03:15
I think when people
00:03:16
today look for
00:03:17
something, that's
00:03:17
what they turn to.
00:03:19
About 15 years ago,
00:03:23
Postgres started
00:03:23
introducing an
00:03:25
extension framework.
00:03:26
And so this is what
00:03:28
allows people to think
00:03:29
of Postgres not only as
00:03:31
the features it offers
00:03:32
itself, the traditional
00:03:34
OLTP database, you know,
00:03:36
back your website, e
00:03:37
commerce site, whatever,
00:03:39
but also a way that
00:03:40
it could be extended
00:03:41
by third parties who
00:03:43
actually have hooks
00:03:44
throughout the database
00:03:45
to make changes.
00:03:47
In the beginning,
00:03:48
that was, you know,
00:03:48
add an index, add
00:03:50
more monitoring,
00:03:51
more smaller things.
00:03:55
But you know, but
00:03:55
timescale really,
00:03:58
I think is at the
00:03:58
forefront of advancing
00:04:01
what you could do
00:04:02
on top of this data
00:04:03
platform other than
00:04:04
what is offered by the
00:04:05
core infrastructure.
00:04:07
And so we basically
00:04:09
built it to be also an
00:04:10
analytics powerhouse to
00:04:12
deal with time series
00:04:14
data, event data,
00:04:16
anything where rather
00:04:18
than very small, you
00:04:19
know, like update,
00:04:20
deletes records, you
00:04:22
want to ingest as
00:04:23
large amounts of data
00:04:25
and then ask questions
00:04:26
about your particularly-
00:04:29
almost how your
00:04:30
data changes over
00:04:31
time, aggregations
00:04:32
on top of it.
00:04:34
Basically insights
00:04:35
to drive things like
00:04:36
real time dashboards,
00:04:38
API, Power Customer
00:04:40
Facing APIs, and other
00:04:41
things like that.
00:04:43
Right.
00:04:44
You already mentioned
00:04:45
like analytics data,
00:04:46
time series, maybe
00:04:47
you can explain
00:04:49
a little bit.
00:04:49
I mean, I know what
00:04:50
it is, but there's
00:04:51
probably still a lot
00:04:51
of people that have
00:04:52
not really an idea what
00:04:54
a time series is and
00:04:55
how to identify one.
00:04:56
Sure.
00:04:57
So we think of time
00:04:58
series as almost
00:04:59
any data that has
00:05:00
a timestamp where
00:05:01
you might actually
00:05:02
care about how
00:05:04
something over time.
00:05:05
Yeah.
00:05:06
And so that could be on
00:05:08
real wall clock time.
00:05:09
How did something
00:05:10
differ day to day?
00:05:11
It could also be in
00:05:12
terms of some abstract
00:05:14
notion of system time.
00:05:15
For example, we have a
00:05:16
lot of crypto companies
00:05:17
that use us for
00:05:18
blockchain analytics,
00:05:20
where the time that
00:05:21
they think about is
00:05:22
actually the blockchain
00:05:24
height, which is a
00:05:25
monotonic counter, as
00:05:28
opposed to something
00:05:29
that corresponds
00:05:30
to wall clock time.
00:05:31
But the nature of
00:05:32
data is that it's
00:05:33
append mostly.
00:05:35
You are often interested
00:05:37
in what happened
00:05:38
recently, not equally,
00:05:40
not randomly what
00:05:41
happened five years
00:05:42
ago, but you also
00:05:44
want to often analyze
00:05:45
the stuff over it.
00:05:46
So rather than simple
00:05:48
point queries, which you
00:05:49
currently can certainly
00:05:50
can do with timescale,
00:05:52
you're often interested
00:05:53
in things that look more
00:05:55
scan like, where you
00:05:56
want to scan over either
00:05:58
some period of time, or
00:06:00
by other thing like, a
00:06:03
device, in a game, in
00:06:06
a gaming application,
00:06:07
a particular user, a
00:06:09
session ID, anything
00:06:11
that actually has that
00:06:11
type of access as well.
00:06:14
Right.
00:06:15
And from a use case
00:06:16
perspective, that is
00:06:17
often observability
00:06:18
data in like the
00:06:20
infrastructure world.
00:06:22
It is any kind of,
00:06:23
you hinted at that,
00:06:24
like IOT data, like
00:06:25
device information,
00:06:27
temperature,
00:06:27
humidity, whatever.
00:06:30
But it can basically
00:06:33
be everything, anything
00:06:35
that has like a time
00:06:36
relation, I think.
00:06:38
Yeah.
00:06:38
And that's how I
00:06:39
would explain it.
00:06:40
We like to almost say
00:06:42
all data is time series,
00:06:44
because as you start
00:06:45
squinting at it, I
00:06:46
mean, I mentioned, you
00:06:48
know, the traditional
00:06:48
OLTP, which is your
00:06:50
e commerce app.
00:06:51
Well, if you actually
00:06:52
look at that database,
00:06:53
where over time, if your
00:06:55
business is successful,
00:06:56
where is most of your
00:06:57
data going to be stored?
00:06:58
It's something like
00:06:59
your orders table, which
00:07:00
is keeping a record
00:07:01
of not just every SKU,
00:07:04
not just every item you
00:07:05
have, but how frequently
00:07:06
you sold that.
00:07:07
Well, what is
00:07:07
an orders table?
00:07:08
An orders table is
00:07:09
basically a log of
00:07:10
events with a timestamp.
00:07:12
And you might look
00:07:12
over the time and see
00:07:13
how your sales has
00:07:14
changed across a time,
00:07:17
across a particular
00:07:18
order, across a region.
00:07:19
This is all what
00:07:21
we think of as time
00:07:22
series or, event data.
00:07:24
And so, yeah, so you
00:07:25
might think of it as
00:07:26
there are certainly
00:07:27
been use cases in
00:07:28
observability, but in
00:07:29
IOT, in manufacturing,
00:07:31
in energy, in gaming,
00:07:33
in product analytics,
00:07:34
in music analytics,
00:07:36
anything where you ever
00:07:37
would build a dashboard
00:07:38
on top of your data,
00:07:39
I think is a good,
00:07:41
or analysis over your
00:07:43
data, I think is a
00:07:43
good use for something
00:07:44
like timescale.
00:07:46
Yeah, I agree.
00:07:47
And I like the, like
00:07:49
order invoice table
00:07:51
kind of stuff, because
00:07:52
that is something
00:07:53
people really do not
00:07:55
consider as time series.
00:07:56
But as you said,
00:07:57
when you want to
00:07:57
build a dashboard
00:07:58
on top of that, it
00:07:59
makes total sense.
00:08:00
You want to understand
00:08:01
like, what is the
00:08:02
average order value?
00:08:04
What is like the
00:08:05
lifetime value
00:08:05
of a customer?
00:08:07
The average lifetime
00:08:07
value of a customer.
00:08:08
All that kind of stuff.
00:08:10
And hopefully your
00:08:11
business is successful
00:08:12
enough that this table
00:08:13
grows quite a bit.
00:08:16
If not, you're probably
00:08:17
in the wrong business.
00:08:19
Right.
00:08:19
And if you even think
00:08:20
of the questions you
00:08:21
ask, you know, in your,
00:08:23
if you have a website
00:08:25
or a web console that
00:08:27
you have your users
00:08:28
log into, typically
00:08:29
you'll show them their
00:08:30
orders in chronologic
00:08:31
order, chronologic
00:08:32
reverse order, right?
00:08:33
Which is again, the
00:08:34
recent stuff gets
00:08:35
shown more frequently.
00:08:37
Or you might allow
00:08:38
them to define, you
00:08:39
know, tell me all
00:08:40
the orders I placed
00:08:41
during this month.
00:08:42
Again, it's, have your
00:08:45
history of all data,
00:08:46
but it narrows into a
00:08:47
particular user or a
00:08:48
particular time range.
00:08:50
That is the type of
00:08:50
things where the type
00:08:53
of capabilities we build
00:08:54
the infrastructure in
00:08:56
timescale maps very well
00:08:57
to efficient, scalable,
00:09:00
performing queries.
00:09:02
So, the next question
00:09:03
would be like, why do
00:09:05
this in an extension?
00:09:06
I mean, Postgres
00:09:07
gives me basically
00:09:10
everything I
00:09:11
need to do that.
00:09:12
And in the worst case
00:09:14
I can partition data,
00:09:15
but why would I do
00:09:16
that with an extension?
00:09:19
Yeah.
00:09:19
So the way I like
00:09:22
to think of is what
00:09:23
timescale has often
00:09:24
done is has made
00:09:25
Postgres is performance
00:09:27
scale for, you know,
00:09:29
particular app for a
00:09:30
particular use cases.
00:09:31
And it is true that
00:09:33
like, obviously SQL
00:09:34
is super powerful
00:09:35
and Postgres is
00:09:35
super powerful.
00:09:36
You could represent this
00:09:38
problem in Postgres.
00:09:39
In fact, the biggest
00:09:41
source of users of
00:09:42
timescale is people
00:09:43
who started on Postgres
00:09:44
and realized that
00:09:45
they need something
00:09:46
else to take it to
00:09:47
the next level, either
00:09:48
because of scalability,
00:09:50
performance, cost
00:09:51
effectiveness or, you
00:09:53
know, we enable a lot
00:09:55
ease of use because of
00:09:56
kind of the additional
00:09:57
functionality built
00:09:58
on top of that.
00:09:59
You know, in general,
00:10:01
why do we ever have many
00:10:03
databases is because
00:10:04
what you can do under
00:10:05
the covers is you
00:10:06
build data structures,
00:10:08
optimizations,
00:10:11
other things that
00:10:12
basically make it
00:10:13
both easier and more
00:10:15
performant to manage.
00:10:16
You know, I could
00:10:17
talk about, you know,
00:10:19
various technological
00:10:20
things that we built,
00:10:22
but, you know, that's
00:10:24
basically the reason,
00:10:24
you know, it, Postgres
00:10:26
works great up to
00:10:27
a certain scale and
00:10:28
then it stops working.
00:10:29
And so what you find
00:10:31
is that users have the
00:10:33
option either to like,
00:10:35
try to get like a very
00:10:36
niche database, move
00:10:38
off Postgres, go to
00:10:39
something they're less
00:10:40
familiar with, their
00:10:41
team's less familiar
00:10:42
with, there's not as
00:10:42
good enough ecosystem
00:10:43
for it, or they could
00:10:45
just, you know, adopt
00:10:46
timescale and basically
00:10:47
keep everything they
00:10:49
already know and love
00:10:49
and then get these
00:10:50
kind of superpowers
00:10:51
on top of it.
00:10:53
Right.
00:10:53
So you mentioned
00:10:54
superpowers.
00:10:55
Maybe, maybe just
00:10:56
give one or two
00:10:57
examples, because
00:10:59
I know there's like
00:11:00
super cool features.
00:11:03
So stop talking black
00:11:04
magic and give the
00:11:07
engineers something to
00:11:08
sink their teeth into.
00:11:10
So one of the really
00:11:11
interesting things
00:11:11
is that Timescale
00:11:14
has a very, what
00:11:16
I'd call interesting
00:11:17
data life cycle.
00:11:18
And let me get into it.
00:11:21
You mentioned briefly
00:11:22
before partitioning,
00:11:24
you know, timescale
00:11:25
has these things, which
00:11:27
we call hyper tables
00:11:28
rather than tables.
00:11:29
And one of the
00:11:30
big differences is
00:11:31
they basically do
00:11:33
automated partitioning
00:11:34
under the covers.
00:11:34
You kind of
00:11:35
create a policy.
00:11:36
You might say, what
00:11:38
roughly is a time period
00:11:39
of which you want to
00:11:40
partition the data?
00:11:41
It could be a week,
00:11:42
it could be a day.
00:11:45
We have high, you
00:11:46
know, really high
00:11:47
end services that do
00:11:48
it every 10 minutes.
00:11:50
And you just set
00:11:50
it and forget it.
00:11:51
And it could be modified
00:11:52
over time and adapt as
00:11:53
your data volume scale.
00:11:56
But under the covers,
00:11:58
what we're doing and
00:11:59
a bunch of things,
00:11:59
but one thing I want
00:12:00
to talk about with
00:12:01
data lifecycle is
00:12:02
we basically kind of
00:12:04
change the formatting
00:12:05
of your data as it
00:12:06
ages because your needs
00:12:08
will often change.
00:12:09
So for the most recent
00:12:10
data, we store that
00:12:11
in row based form,
00:12:13
and that becomes much
00:12:14
more efficient than to
00:12:15
do high ingest rates,
00:12:16
'cause usually
00:12:17
data comes in as an
00:12:18
individual row and
00:12:19
you can kind of insert
00:12:20
it row after row,
00:12:21
after row after row.
00:12:22
And then as it gets
00:12:23
a little bit older,
00:12:25
we basically have,
00:12:28
the engine underneath
00:12:29
the covers would
00:12:30
automatically change
00:12:31
it to a compressed
00:12:33
columnar format.
00:12:34
And so, compression is
00:12:36
great for cost savings.
00:12:38
In production, we see on
00:12:40
average something like
00:12:41
90 to 95 percent storage
00:12:44
reduction once people
00:12:45
adopt our columnar.
00:12:46
We actually use
00:12:47
different algorithms
00:12:49
per column based on the
00:12:50
data type automatically.
00:12:52
And then columnar
00:12:53
means that I talked
00:12:54
about scans before,
00:12:55
it becomes much more
00:12:56
efficient then to do
00:12:59
kind of longer queries
00:13:01
over time, particularly
00:13:03
when you're interested
00:13:05
in like primary keys
00:13:06
where you want to
00:13:07
scan over the user or
00:13:08
when you want to ask
00:13:09
questions about certain
00:13:10
columns as opposed
00:13:11
to all the columns,
00:13:12
let's say, that might
00:13:12
be in the database.
00:13:14
And then even third on
00:13:15
our cloud, we offered
00:13:16
a tiered storage
00:13:17
now, which is when
00:13:18
we actually store the
00:13:19
data from, you know,
00:13:21
it starts in hot, high
00:13:22
performance row format.
00:13:24
To high performance
00:13:25
columnar format, and now
00:13:27
to tiered, warm storage.
00:13:29
So this is kind of
00:13:30
tiering it off to S3.
00:13:32
So it becomes this
00:13:33
bottomless store for
00:13:35
data that, again, is
00:13:36
transparently queried,
00:13:38
but now provides
00:13:40
this trade off for
00:13:40
you as a developer.
00:13:42
You know, your trade off
00:13:42
between the performance
00:13:44
you get from it versus
00:13:45
the cost of, you
00:13:46
know, how expensive
00:13:47
the storage is.
00:13:48
And again, you
00:13:49
create a hyper table.
00:13:50
You set two policies for
00:13:52
when you want to move
00:13:54
something to columnar
00:13:54
and when you want to
00:13:55
move something to tiered
00:13:56
storage and everything
00:13:57
else is handled for you.
00:13:58
So there's an example
00:13:59
of like, obviously
00:14:00
Postgres itself doesn't
00:14:01
have columnar storage,
00:14:03
doesn't do this tiering.
00:14:05
There's a lot of query
00:14:06
optimizations on top
00:14:07
of it, which we use to
00:14:09
select which portion
00:14:10
of your data we query.
00:14:12
We build a lot
00:14:13
of small indexes.
00:14:14
We build sparse indexes.
00:14:16
There's a lot of
00:14:16
this fancy stuff
00:14:17
under the covers.
00:14:18
Again, as a developer,
00:14:19
you don't really think
00:14:19
about that, kind of,
00:14:21
we've done that for you.
00:14:23
Right, right.
00:14:24
And as a developer,
00:14:27
I feel like I want to
00:14:28
get started right now.
00:14:29
So what is-
00:14:31
how would I
00:14:31
go about that?
00:14:33
Yeah, so there's
00:14:33
two ways.
00:14:33
One is Timescale
00:14:35
is open source.
00:14:36
You could go to GitHub,
00:14:37
you could download,
00:14:37
we provide, you know,
00:14:38
different installations
00:14:39
that you could use.
00:14:40
We also, in our
00:14:41
businesses, we build a
00:14:43
Timescale cloud, which
00:14:43
is a managed service.
00:14:45
And so, you know,
00:14:45
you log in, go to
00:14:46
timescale.com, click
00:14:48
one button and you have
00:14:48
a database running,
00:14:50
you know, up and
00:14:50
starting in two minutes.
00:14:53
That takes, you know, I
00:14:55
think I would describe
00:14:55
our cloud in two ways.
00:14:56
One is, it obviously
00:14:58
takes care of all the
00:14:59
operational complexity
00:15:00
you normally have from
00:15:02
operating Postgres,
00:15:03
HA, replicas, backup
00:15:05
restore, port and time
00:15:07
recovery, upgrades,
00:15:09
observability,
00:15:09
monitoring, 24/7,
00:15:11
all that stuff.
00:15:13
But also it has, it is
00:15:16
intentional that we know
00:15:18
we're building these
00:15:18
type of applications.
00:15:20
So not only there
00:15:21
are operational
00:15:21
improvements.
00:15:22
But there are also
00:15:23
those like application
00:15:24
level improvements.
00:15:25
So these things
00:15:26
like tiered storage,
00:15:27
which you might have.
00:15:28
You don't find this
00:15:29
in something like,
00:15:30
you know, RDS or
00:15:31
something else.
00:15:32
This is really built
00:15:33
because we understand
00:15:34
our problem domain
00:15:35
and build a cloud
00:15:36
product for it.
00:15:38
Right.
00:15:39
Right.
00:15:39
And you said you
00:15:40
can run it yourself.
00:15:43
It's Apache licensed.
00:15:45
Is there anything I need
00:15:46
to be careful about?
00:15:47
And like, how do
00:15:50
I deploy that?
00:15:51
Do I deploy it
00:15:51
into Kubernetes?
00:15:52
Does it run on
00:15:53
Docker, like virtual
00:15:55
machine or whatever?
00:15:57
All of the above.
00:15:58
You know, you could, I
00:16:01
mean, software you can
00:16:02
deploy in many places.
00:16:05
We provide
00:16:06
Docker images.
00:16:06
We provide some RPMs
00:16:08
that you could install
00:16:09
into your thing.
00:16:11
We ourselves use
00:16:13
Kubernetes internally,
00:16:14
and we know a lot
00:16:15
of people, deploy.
00:16:18
And we have a Helm
00:16:20
chart, but a lot
00:16:20
of people deploy
00:16:22
it with existing
00:16:23
Kubernetes operators.
00:16:25
Kind of community
00:16:26
ones built for
00:16:26
Postgres, they work
00:16:27
for Timescale as well.
00:16:29
Right.
00:16:30
You already hinted
00:16:31
you're using
00:16:32
Kubernetes internally.
00:16:34
So let's switch to
00:16:35
that because we're a
00:16:36
Kubernetes podcast.
00:16:37
And I love podcasts.
00:16:39
So we can share about
00:16:41
the infrastructure
00:16:41
without I have to
00:16:43
sign an NDA for the
00:16:43
whole community.
00:16:45
Sure, sure.
00:16:47
So, you know,
00:16:47
we're heavy users
00:16:48
of Kubernetes.
00:16:49
One of the things this
00:16:50
allows us to do is, you
00:16:52
know, operate at scale,
00:16:55
where we have decoupled
00:16:57
the problem of spinning
00:16:59
up AWS instances, which,
00:17:02
you know, often or other
00:17:03
hyperscaler instances,
00:17:06
which are often on the
00:17:06
order of minutes from
00:17:09
individual placements of
00:17:12
containers and whatnot.
00:17:14
You know, it's been-
00:17:16
We've been using
00:17:18
Kubernetes from
00:17:19
the beginning.
00:17:19
So I guess now since our
00:17:20
cloud has been almost
00:17:22
five years since we
00:17:23
started building it,
00:17:26
have obviously gone
00:17:27
through the joys and the
00:17:32
trials and tribulations
00:17:33
of deploying
00:17:34
Kubernetes at scale.
00:17:36
One of the big things
00:17:37
about timescale cloud is
00:17:40
that we have decoupled
00:17:41
compute and storage.
00:17:43
So, customers, users
00:17:45
can independently size
00:17:47
their database and
00:17:49
kind of at any time,
00:17:50
change the sizing.
00:17:51
So if you all of
00:17:51
a sudden want to
00:17:53
move from two CPUs
00:17:54
to 16 CPUs, it's a
00:17:56
click of the button.
00:17:57
And that can be done
00:17:58
with, in some cases
00:18:00
in HA, no downtime.
00:18:01
If you have non
00:18:02
HA with, you know,
00:18:03
typically something
00:18:03
like 30 seconds of just
00:18:06
replacing container,
00:18:08
because it's decoupled
00:18:08
from your storage.
00:18:10
And then we manage
00:18:11
storage as a completely
00:18:13
separate tier that
00:18:14
grows independently.
00:18:15
One of the other
00:18:16
interesting things
00:18:17
about, again, enabled
00:18:19
by Kubernetes, but
00:18:20
also a lot of things
00:18:21
we do ourselves is
00:18:22
we, you know, normally
00:18:24
when we think of
00:18:24
databases, you think
00:18:25
of, hard provisioning
00:18:26
your storage.
00:18:27
And you need to
00:18:28
provision a disk.
00:18:29
I want to provision
00:18:30
a hundred gigabytes
00:18:31
of storage or a
00:18:33
terabyte of storage
00:18:33
as the case may be.
00:18:35
On our cloud, we
00:18:37
hide all of that
00:18:38
complexity from users
00:18:39
and users purely have
00:18:41
usage based storage.
00:18:43
So what that means is
00:18:43
they never think about
00:18:44
allocating storage.
00:18:46
They just start
00:18:46
storing data with it.
00:18:48
It scales as they need.
00:18:50
We manage all that in
00:18:52
the backend and they pay
00:18:53
for only what they use.
00:18:54
So if they're using
00:18:56
496 gigabytes, that's
00:18:58
how much they pay for.
00:18:59
And if they turn
00:19:00
on something like
00:19:00
compression and
00:19:01
it drops down by
00:19:02
half their storage
00:19:03
consumption, which
00:19:04
is what often happens
00:19:05
with our customers,
00:19:05
they just pay for half.
00:19:07
And so they kind of,
00:19:09
we've made it easier
00:19:10
and have allowed
00:19:10
our users not to
00:19:11
really think about
00:19:13
managing those things.
00:19:15
I think you mentioned an
00:19:16
important thing and it
00:19:17
was basically one of the
00:19:18
questions I would have
00:19:20
asked anyways, like when
00:19:22
you have compression,
00:19:24
you do not calculate
00:19:25
the uncompressed
00:19:26
storage, if I understood
00:19:28
that correctly.
00:19:28
But, you're calculating
00:19:31
the actual storage uses.
00:19:32
So if I have like a
00:19:34
90 percent compression
00:19:35
ratio, that means
00:19:35
I'm paying like for
00:19:37
10 percent of the
00:19:38
actual data stored.
00:19:40
Yeah, this is one big
00:19:41
way, where in many
00:19:43
cases, timescale turns
00:19:45
out to be cheaper.
00:19:46
Not only more scalable
00:19:47
and performant, but can
00:19:48
be cheaper than even
00:19:51
something like using
00:19:52
RDS Aurora is because,
00:19:55
you know, we'll charge
00:19:56
you for your storage.
00:19:58
And so maybe on a per
00:20:00
gigabyte of storage,
00:20:01
we are charging more
00:20:03
than RDS Aurora.
00:20:05
But if you're only using
00:20:06
one tenth of the size.
00:20:08
You know, it obviously
00:20:09
translates to savings.
00:20:11
Not only again,
00:20:12
operational and
00:20:14
operational, some
00:20:16
improvements, you know.
00:20:17
We regularly operate
00:20:20
with customer databases
00:20:21
who are storing many
00:20:22
terabytes of data and
00:20:25
Postgres, you know, has-
00:20:28
starts getting
00:20:29
challenges at
00:20:30
that scale.
00:20:31
Not, you know,
00:20:33
storage on disk and
00:20:34
querying it, but the
00:20:35
operational size.
00:20:37
Try to take a backup
00:20:38
and then restore from
00:20:40
a, you know, try to
00:20:41
use pgDumpRestore
00:20:42
on a 10 terabyte
00:20:45
disk and, you know,
00:20:46
it's not very happy.
00:20:50
Even if you use it in
00:20:51
a binary mode, it will
00:20:53
yeah, as you said, it's
00:20:54
not going to be happy.
00:20:56
So you said you're
00:20:58
running on Kubernetes
00:20:59
and you're hardcore
00:21:01
Kubernetes users.
00:21:03
So, we all know
00:21:05
running a database in
00:21:06
Kubernetes isn't as
00:21:08
easy as it could be.
00:21:09
So what do you think
00:21:10
is like the biggest
00:21:11
issue or the biggest
00:21:13
like problem right
00:21:15
now, running it at
00:21:16
scale, making sure
00:21:18
it's performance is
00:21:20
on point and you think
00:21:22
like stuff like noisy
00:21:23
neighbor and, whatever
00:21:24
you can come up with.
00:21:25
Well, what do you've
00:21:26
seen in real world?
00:21:30
Yeah, I think there's a
00:21:32
couple things going on.
00:21:34
One is we,
00:21:41
you know, we, the,
00:21:42
some of the OS level
00:21:48
performance isolation
00:21:49
between instances
00:21:50
we actually think
00:21:51
works quite well.
00:21:53
And we found
00:21:55
that at least at
00:21:57
that level, the-
00:21:58
on the compute side,
00:22:00
we have not run
00:22:02
into, you know,
00:22:05
it has actually
00:22:06
worked out quite well.
00:22:07
One of the big things
00:22:08
that we don't have is
00:22:11
we actually have storage
00:22:12
isolation between users.
00:22:14
So we've actually
00:22:15
seen most of the, when
00:22:18
people run into like
00:22:19
noisy, when people
00:22:21
have concerns about
00:22:22
noisy neighbors, it
00:22:23
often relates to, well,
00:22:26
one of two things.
00:22:27
One is, the fact that
00:22:31
some architectures
00:22:32
have shared storage
00:22:33
backends that doesn't
00:22:34
have actually good
00:22:35
performance isolation
00:22:36
between customers.
00:22:38
And they don't
00:22:40
have the ability to
00:22:43
basically manage IOPS
00:22:46
and bandwidth on a
00:22:47
per customer basis.
00:22:48
And that is something
00:22:49
that we actually do
00:22:49
in the cloud, that we
00:22:53
also have the ability
00:22:54
to not only do we have
00:22:55
of isolated storage
00:22:58
per user storage
00:22:59
capacity, but we
00:23:01
actually can provision
00:23:02
IOPS and bandwidth on
00:23:03
a per tenant basis.
00:23:05
And what we naturally
00:23:07
do is we actually scale
00:23:09
IOPS and bandwidth with
00:23:10
your storage capacity.
00:23:12
So when you sign up
00:23:13
without doing anything
00:23:14
else, we start you on
00:23:15
a lower level and then
00:23:16
we scale it as your-
00:23:19
as your storage
00:23:20
scales itself.
00:23:22
But we also have the
00:23:23
ability to boost it.
00:23:24
So we have this notion
00:23:25
of which kind of we
00:23:27
internally called IO
00:23:28
boost, where we have
00:23:28
the ability to kind of
00:23:29
max out on a certain
00:23:32
customer basis, kind
00:23:33
of IOPS and bandwidth.
00:23:35
Obviously it costs
00:23:35
some extra to us.
00:23:36
We, you know, we pass
00:23:38
on some of that cost
00:23:39
to our customers.
00:23:40
But that actually
00:23:41
allows a user to kind
00:23:43
of start nicely and
00:23:44
then scale as needed.
00:23:46
You know, I think the
00:23:47
other thing I would
00:23:48
generally say with
00:23:50
Kubernetes is, you're
00:23:52
right in that it you
00:23:56
know, some of its
00:23:56
early promises are
00:23:57
meant for like these
00:23:58
stateless tiers of
00:24:00
horizontal nodes where
00:24:01
you don't actually
00:24:02
ever have to think
00:24:03
about the instances.
00:24:04
They're all independent.
00:24:07
And we've never really
00:24:09
thought of it that way.
00:24:11
We've, in the beginning,
00:24:12
we've actually had
00:24:13
to heavily invest
00:24:13
in writing our own
00:24:14
operators to manage a
00:24:16
lot of things because
00:24:17
we need to, you
00:24:18
know, we don't manage
00:24:19
thousands or tens of
00:24:20
thousands of database
00:24:22
that look the same.
00:24:23
We manage each one that
00:24:24
looks independently
00:24:25
different and has to
00:24:26
be managed separately.
00:24:28
And the other thing I
00:24:30
would say is there's
00:24:31
a lot of abstractions
00:24:33
in Kubernetes.
00:24:34
Let's take an example
00:24:34
stateful sets where
00:24:37
the thesis is that all
00:24:40
instances of a stateful
00:24:41
set look identical.
00:24:43
And that's actually
00:24:44
something that we've
00:24:45
both had to fight about,
00:24:47
fight with, and then
00:24:48
eventually kind of
00:24:49
conclude that it's not
00:24:50
right for us because
00:24:51
we actually want to
00:24:53
manage, even replicas of
00:24:55
a database separately.
00:24:58
So you could do things
00:24:58
like intelligent
00:25:00
staged upgrades,
00:25:01
staged rollouts,
00:25:01
staged resizing, which
00:25:03
certain abstractions in
00:25:05
stateful, like stateful
00:25:06
sets, which is meant to
00:25:08
make all of these things
00:25:09
identical, actually
00:25:10
aren't well suited for.
00:25:13
Right, right.
00:25:14
I think the last one
00:25:16
is really interesting
00:25:17
because that is
00:25:17
something that a
00:25:18
lot of people will
00:25:19
probably fight in
00:25:22
the near far future.
00:25:24
When more and more
00:25:25
databases move
00:25:26
to Kubernetes.
00:25:28
I didn't think about
00:25:29
it because I always
00:25:30
considered like the
00:25:31
replicas being the
00:25:32
same, but you're right.
00:25:35
If you want to do like
00:25:35
a rolling upgrade or
00:25:36
stuff, it's going
00:25:38
to be a little bit
00:25:39
more challenging.
00:25:40
Anyway, we're pretty
00:25:41
much out of time.
00:25:42
So one last question,
00:25:44
what do you personally
00:25:45
think is like the
00:25:45
next big thing?
00:25:46
What is like upcoming?
00:25:48
What is already
00:25:49
here, but growing?
00:25:52
I know the answer, but-
00:25:55
Well, let me-
00:25:56
Maybe this is the
00:25:57
answer you thought,
00:25:58
but I'm going to give
00:25:59
a back to the future.
00:26:00
So, you know, Timescale
00:26:02
started by building time
00:26:03
series analytics on top
00:26:05
of Postgres, and about
00:26:06
a year ago, we launched
00:26:08
our AI product, our
00:26:09
vector search product.
00:26:10
And kind of open
00:26:11
sourced it under
00:26:11
the Postgres license
00:26:12
just a month ago.
00:26:13
A lot of really
00:26:14
interesting to do
00:26:15
what we call PG vector
00:26:16
scale, which is scalable
00:26:18
vector search, and
00:26:18
then PG AI, which is
00:26:19
a lot of you do AI.
00:26:21
Things like OpenAI,
00:26:23
Cohere, Olam
00:26:24
Embeddings directly
00:26:25
in your database.
00:26:27
More broadly, I think we
00:26:29
are bullish on Postgres.
00:26:32
And what that means
00:26:33
is that, you know, we
00:26:34
think that 99 percent
00:26:36
of developers' problems
00:26:39
can be solved with
00:26:39
Postgres for a database.
00:26:41
Now there's going to
00:26:41
be always that 1%.
00:26:43
There's going to be
00:26:43
the, Hey, I'm building
00:26:45
Netflix, Uber, Google.
00:26:47
They're going to build
00:26:48
custom, but that's not
00:26:49
what most companies do.
00:26:51
And that's both startups
00:26:52
and actually, you
00:26:54
know, we deal with
00:26:55
some companies that
00:26:56
are one month old.
00:26:56
We've deal with
00:26:57
companies that were
00:27:00
built in the industrial,
00:27:01
first industrial
00:27:02
revolution, right?
00:27:04
So we have the large
00:27:05
gamut of companies
00:27:07
that we serve.
00:27:08
And for many of them,
00:27:10
the reliability,
00:27:12
usability, the
00:27:15
ecosystem of Postgres
00:27:16
is so powerful.
00:27:17
You know, they have
00:27:18
people who know that
00:27:18
they have people who
00:27:19
trust that they have,
00:27:20
if they want to use
00:27:21
our managed cloud.
00:27:22
If they want it
00:27:22
themselves, there's
00:27:23
lots of options.
00:27:24
And so, we're very
00:27:26
bullish on that.
00:27:27
And, you know, we
00:27:28
continue to basically
00:27:30
think about how we
00:27:32
then, like I said, make
00:27:33
Postgres powerful for
00:27:34
all these different use
00:27:35
cases where it could
00:27:36
serve these types of
00:27:37
demanding applications.
00:27:38
So kind of expect
00:27:39
to see, you know, we
00:27:40
started by in time
00:27:41
series and analytics.
00:27:43
We built an amazing
00:27:44
cloud platform
00:27:44
for Postgres.
00:27:46
And in doing so, we
00:27:46
built, like I said,
00:27:47
a cloud platform for
00:27:48
Postgres, not just
00:27:50
a cloud platform
00:27:51
for time series.
00:27:52
And so expect
00:27:53
more from that.
00:27:53
All right.
00:27:53
Fair enough.
00:27:55
Fair enough.
00:27:55
That was not the
00:27:56
answer I expected.
00:27:59
Everyone else said AI.
00:28:00
You kind of made
00:28:02
me sad here.
00:28:03
Well, AI, I actually,
00:28:06
I mean, AI,
00:28:08
but maybe I have-
00:28:08
Let me get take two
00:28:10
on that question.
00:28:12
I think five years
00:28:13
from now, we won't be
00:28:14
talking about building
00:28:15
AI applications.
00:28:16
We'll be talking about
00:28:17
building applications
00:28:18
where it will, they
00:28:19
will all incorporate AI
00:28:20
where it makes sense.
00:28:21
And so, you know, I
00:28:23
think we're going to
00:28:24
have a lot of people
00:28:25
talk about being AI
00:28:26
engineers and whatever.
00:28:27
I think this will
00:28:27
happen, but I think this
00:28:29
will actually recede
00:28:30
because this will be
00:28:31
another thing in the
00:28:33
tool belt of developers
00:28:35
where what it means to
00:28:35
be a modern developer.
00:28:38
All right.
00:28:38
I like that.
00:28:39
That's a beautiful
00:28:40
last sentence, I guess.
00:28:42
So, we're out of time.
00:28:44
Thank you, Mike,
00:28:45
for being here.
00:28:46
Thank you for being
00:28:46
an awesome guest.
00:28:47
And for-
00:28:50
And for the
00:28:51
audience, thank
00:28:52
you for being here.
00:28:53
Hope to see you
00:28:54
next week, same
00:28:56
place, same time.
00:28:57
And thank you very much
00:28:58
for being here as well.
00:29:02
The Cloud Commute
00:29:03
Podcast is sponsored by
00:29:04
Simplyblock, your own
00:29:05
elastic block storage
00:29:06
engine for the cloud.
00:29:07
Get higher IOPS and
00:29:09
low predictable latency
00:29:10
while bringing down your
00:29:11
total cost of ownership.
00:29:12
www.simplyblock.Io

