At Rivigo Labs, we are building the next generation of data acquisition, processing and visualization tools that will drive changes in the logistics industry. This essentially means building a data architecture that can cater to the needs of this complex system. Here are some of the considerations that I have for data architecture that we are building –
1. Scalability – Can handle growing amount of works
2. High availability – Allows continuity of work
3. Performance – Responsive within reasonable time
4. Maintainability – Ease of making future changes
5. Comprehendible – Easy to understand
Now, there are several technology that can fit the bill for any type of architectural considerations. But how do you select the right technology?
And, I believe this is a wrong question to ask at the initial stages of designing the data architecture. The right questions to ask are – What are the requirements? How the information and data will flow through the system? What are the events in the system and how they will be generated? Who are the consumer of the data?
The idea is to be able to draw a simple diagram that represents answers to the above questions. This will help understand the functionality and complexity of the system. And at this stage, it is essential to introduce discussions on the scalability of the system. I feel this approach is extremely cost effective because the cost of changing anything is as good as changing a diagram to represent new assumptions and flow.
Once this stage is hashed out after considering exceptions and important edge cases, the time is right to think about what is the right technology to be used to build this system?
This way you are not forcing yourself to adopt to a particular technology but opening up to evaluate multiple technology that meets the design.
Hemant, Though the concerns you want to address with your architecture plan are apt, you have missed an important piece here; what is it that you want to monitor (online) and what would you like to measure (offline). You need to know what information you want to be readily available in order to answer questions like “how are my systems performing?” and “given how my systems performed, what can I do to improve them?”. Once you are aware of these facts, you have your requirements, and moreover, it can help you visualize how the data should flow through the system?
Good thoughts Dev. I would say that you need to know what kind of decisions that someone will make looking at these systems. This can hemp combine both the application building as well determine the data and data science approach that needs to be taken.
Hemant, I found Martin Klepmann’s book on Designing data intensive applications to be really helpful while designing a CEP system handling 2800 events per second. Cheers, Subhojit
Thanks for the reference.