We run an internet-scale bot that crawls the whole Web 24/7, storing huge volumes of information to be indexed and structured in a timely fashion. On top of that we are building analytical services for end-users. We develop a custom petabyte-scale distributed storage platform to accommodate all that data coming in at high speed, focusing on performance, robustness and ease of use. The performance-critical, low-level code is implemented in C++ on top of a distributed filesystem, while all the coordination logic and the communication layer, along with the API library exposed to the developerm is in OCaml.

