Rezha Julio

Hi!
My name is Rezha Julio
I am a chemist graduate from Bandung Institute of Technology. Currently working as Data Engineer at Traveloka.
You can reach me by email:

contact@rezhajulio.id

, @ Q Q

Powered by Hugo

filter by tags

Is Schemaless Databases Really Exists?

time to read 5 min | 924 words

There’s no such thing as a schemaless database. I know, lots of people want a schemaless database, and lots of companies are promoting their products as schemaless DBMSs. And schemaless DBMSs exist. But schemaless databases are mythical beasts because there is always a schema somewhere. Usually in multiple places, which I will later claim is what causes grief.

There Is Always A Schema

We should define “schema” first. It comes from Greek roots, meaning “form, figure” according to my dictionary. Wikipedia says, roughly,

A database schema is its structure; a set of integrity constraints imposed on a database. These integrity constraints ensure compatibility between parts of the schema.

In other words, a schema expresses expectations about what fields exist in a database, and what their types will be. It also enforces those expectations, at least to some extent (there’s usually some flexibility).

My claim is that there’s always a schema, because somewhere, something has expectations about what’s in a database. At least, any useful, practical, real database. The DBMS itself may not have such expectations, but something else does.

Schema In The Database

When the DBMS enforces the schema, then we say the schema is in the database. If you’re using MySQL and you try to insert a value into a column that doesn’t exist, you’ll get an error like this:

ERROR 1054 (42S22): Unknown column 'flavor' in 'field list'

Whoops. I’ll have to run an ALTER TABLE if I want to do that.

Schema In The Code

When I used MongoDB, I wouldn’t have this problem. I could write my code to insert flavor fields in documents, and read back those documents and do something with the flavor field. I don’t have to have the schema in the database (the DBMS doesn’t have to enforce it).

Now my schema is in my code, isn’t it? I can’t do anything useful with something’s flavor attribute unless the code knows it’s there. You could argue that maybe my code doesn’t have to know about it; perhaps it just mindlessly accesses whatever it finds and lets something else do what it pleases with it. In that case, though, the schema is in the client application or user. The buck has to stop somewhere.

It reminds me of the semantic web, microformats, and the like. All very nice, but somewhere, something or someone has to know what a person is, what an address is, what a song is, what an album and artist is. It can’t be infinite turtles all the way down, can it?

Schema In Both Places

I’ve just claimed that the schema is in the code if it’s not in the DBMS. If I use MySQL and add a flavor column to the table, then my DBMS knows that this attribute is valid. But even when the DBMS has the schema, the code does too. If my code doesn’t know, respect, and agree with the schema in the DBMS, then we’re going to have problems like the Unknown column error above.

This is where the fallacy enters, in my opinion. People say their database has no schema, is unstructured, etc. It would be more accurate to say “there is no single centralized schema definition. It is scattered throughout my code.”

Is that a bad thing?

In my opinion, no. A strongly enforced central definition is a dependency that doesn’t scale well, in human terms. Large codebases end up with dependencies on centralized schema definitions that are brittle and require lots of things to be updated at a single time, instead of allowing the code to cope with a fluid and evolving schema definition and gradually be updated.

I remember working at an ecommerce website that had many hundreds of databases, thousands of tables, and if I recall correctly, millions of stored procedures. We used a vendor tool to scan all our source code and databases and show us graphs of the relationships between all these things. After months of waiting for the indexing to complete, we opened up the application and the moment of truth arrived. “Let’s look at the order inventory table,” someone suggested. A glorious hairball emerged, slowly painting line after line until the screen was just a big black blob. It was useless and just told us what we already knew: the schema of the order inventory table was expressed in so many places, a change to it was probably impossible. I don’t know, but I’d bet a donut it hasn’t changed since then.

The other point of view on this is that the database’s job is to define the data and ensure only valid data is entered. I know this is a common point of pride among people who like PostgreSQL better than MySQL. And it’s surely valid, as well. It’s true that if the DBMS is permissive, you can end up with garbage in it. But my experience with large applications has been that this feels good at first and then becomes a problem later on. Just my two cents.

Conclusion

Since this is more or less a rant, I should not go on too much longer. Main points:

  • A database isn’t just a DBMS and the schema and data in it. The apps that interact with the data are usually part of the database per se, too.
  • There’s no such thing as schemaless. The schema is always in the code; the question is whether it’s also centrally enforced in the DBMS.
  • My experience has been that centralized schema definitions are harder to scale on large applications and codebases.

Hal yang tidak kamu ketahui tentang mongodb

time to read 3 min | 440 words

Iya, kamu. #apasih

MongoDB adalah salah satu database yang lagi ngetrend saat ini. MongoDB ini juga merupakan database yang paling popular di kelas NoSQL Database. Beberapa waktu lalu saya udah mencoba untuk menggunakan database ini dan cukup bagus, namun banyak orang yang masih belum tahu keterbatasan dari database yang satu ini sehingga mereka menganggap keterbatasan database ini sebagai bug. Saya nulis artikel ini agar kamu tahu keterbatasan yang dimiliki oleh MongoDB sehingga kalau kamu menggunakannya ga bikin kamu sakit kepala :D.

Rakus space

YA! Pertama kali saya menggunakan MongoDB, database yang satu ini sangat rakus space. Ini berhubungan dengan langkah MongoDB untuk menghindari fragmentasi disk pada database mereka dengan mengalokasikan file berukuran besar. Ketika kamu pertama kali membuat sebuah database di MongoDB, misal kita namakan Rezha.0, database ini akan langsung memakan memori sebesar 64 MB. Sadis kan? Kalau aplikasi yang kamu buat ga memakan database yang besar, ini jelas pemborosan. Ketika aplikasi kamu menggunakan lebih dari setengah space database di file Rezha.0, MongoDB akan langsung membuat file baru bernama Rezha.1 yang berukuran 2 kali lipat dari sebelumnya, yakni 128 MB. Begitu juga ketika Rezha.1 ini telah digunakan lebih dari setengahnya, akan dibuat lagi file Rezha.2 dengan ukuran 512 MB, dan seterusnya hingga file database baru yang dibentuk akan mencapai 2GB per filenya.

Kalau space adalah salah satu limitasi di proyek kamu, maka kamu harus memikirkan matang-matang sebelum menggunakan database ini. Ada salah satu produk komersial turunan MongoDB yang bernama TokuMX, yang menurut pengakuannya bisa mengurangi penggunaan space hingga 90%.

Limitasi 32 bit

Versi 32 bit dari MongoDB ini juga dapat dibilang kurang bagus karena memiliki limitasi lain, yaitu hanya mampu menghandle data sebesar 2GB. Sangat nanggung untuk mereka yang akan menggunakan MongoDB untuk skala besar. Solusinya ? Pakai 64 bit!

Biaya konsultasi mahal

Kalo kamu berniat untuk konsultasi dengan team MongoDB untuk mengatasi problem yg kamu miliki. Mereka mematok harga cukup mahal, yakni sebesar $ 450 per jamnya, dan kamu minimal harus membayar untuk 2 jam, jadi minimal sebesar $900 atau Rp 10.7 Juta harus kamu rogoh dari kocek kamu untuk sekali konsultasi.

Tools administrasi kurang

Kalo kamu udah terbiasa dengan phpmyadmin untuk MySQL, mungkin kamu bakal kecewa kalo menggunakan MongoDB, karena tools yang ada seperti RockMongo sangat kurang fitur. Mungkin masih bisa sedikit terobati dengan RoboMongo, namun silahkan coba sendiri untuk pastinya

Official limitations

Yang menyedihkan adalah ga banyak orang yang mencari tahu keterbatasan dari teknologi yang akan mereka adopsi. Staff MongoDB sudah membuat sebuah halaman tentang limitasi dari MongoDB ini. Yang saya bahas diatas gak ada di halaman itu tentunya :D. Semoga dengan artikel ini kamu tahu limitasi MongoDB dan ga kaget kalau kamu mau menggunakan database ini.

RECENT SERIES

  1. java 101 (13):
    Apr 29, 2017 - Translating Scanner tokens into primitive types
  2. python data structure (5):
    May 03, 2017 - Enhance your tuples
  3. python function (2):
    Apr 16, 2017 - Lambda Functions in Python
  4. python generator (4):
    Apr 26, 2017 - Next, Function or Method ?

Friends of Rezha