At Zerve, our mission is to build an amazing data science development experience. We rethought the process of building real, practical AI solutions in a modern context and we've boiled down what we've learned into these six principles.
Data science tools should be built for data scientists.
Modern problems are frequently too complex or too nuanced for a GUI-based or automated approaches.
Wizard-based or automated approaches must be convertible to code in order to be viable for complex problems.
Automation tools – like autoML, code complete, automated recommendations on error – should be targeted at coders with the purpose of making their work more efficient and accurate.
Notebooks were designed to be interactive and exploratory, and this is a critical function that must be preserved.
This interactivity is possible without sacrificing stability, repeatability, and safety.
One good test for this is that no matter how many times or in what manner your code is executed, it should always produce the same result.
Nobody on a team should ever do anything on their local computer.
Compute, memory, and storage should autoscale on demand.
Collaboration must be in real time.
Sharing, commenting, version history, access control are all non-negotiable.
You shouldn’t have to dump your work somewhere else in order to use it, either for communication or production. If you want to move your work somewhere else, it should be easy to do so, but it shouldn’t be required in order to get the results you want.
Tools should support moving your work to where the end users are, whether that’s an app, an API, PowerPoint, or anything else.
Nobody should have to stop their work because they have a long-running code block being executed. Where else in the world does this happen?
Multiple models must be trainable at the same time.
Data visualization, data exploration, feature engineering, and more should all be possible while other code is running.
Switching between tabs or windows isn’t a good solution to this problem.
Data science teams should not have to rely on separate devops teams to provision cloud resources, approve users, or deploy their work, but they also shouldn’t be operating shadow IT organizations.
Data science development environments must support deployments that meet organizational reliability, security, and consistency requirements, without the need for rework or rewrites.
Collaboration must be seamless so teammates and managers can view, comment on, suggest changes to, and even edit each other's work in real time.