Unfortunately it also gets many things wrong, when viewed from a web applications perspective.
Three core issues:
- Long lived connections
- Antiquated schema design
- monolithic architecture
Let’s just focus on the last item, the monolithic architecture.
Looking at search – it’s fantastic that it provides a search mechanism, but how can you get funky with the search engine? Learn C! Fork OpenLDAP
What subsystems are tightly coupled to search?
- Schema design
- Data Storage
- LDAP protocol
How do you switch search engines? You can’t… That is the trouble with monolithic designs.
What we want is a system filled with loosely coupled components. Use native search one day, but swap it out for Elastic Search the next. We want a data service which can be scaled across many different servers, so I can run a search cluster on some nodes, the data storage layer on some nodes, etc.
There is a very good reason that OpenLDAP is monolithic… ACL are deeply baked in and affect most other subsystems. When you search ‘cute kittens’, you don’t just get everything. You get results that contain ‘cute kittens’, but subject to the currently authenticated user and their trusted roles in the system. You may get none, a subset, or all of the records relating to these adorable creatures.
Most modern web applications use only one or perhaps a few credentials to access all data and then trust the business logic to enforce privacy and security constraints. How many systems have you worked on that had only web, admin, and replication accounts for mysql?
What if we had a web oriented data store that respected user privacy as deeply as possible? Your web application would be just another user agent.
It would be an interesting exercise to explode this architecture into contemporary standards, protocols, and established products.
A hypothetical architecture, let’s call it RiverBear:
In this diagram we see authentication via BrowserID and authorization via a new service, say RiverBearGroups (ACL as webservice). This service returns portable data such as jpeg, portable contacts, html, JSON, etc. Iterating a collection of data could be done via ActivityStreams.
We see RiverBear coordinating these stand bits of data via Search (ElasticSearch, Sphinx, whatever), file systems, and backend datastores like Drizzle/MySQL/CouchDB/MongoDB or whatever.
To improve performance and deployment flexibility, a RiverBear plugin could later be added (to subsystems with plugin architectures) which would filter out data at the lowest layer possible. Example: RiverBear-Drizzle would use your current authentication and authorization to throw any unauthorized data or raise errors for modifications. Regardless, this would be done on the frontend, so without the plugin it is secure (but less efficient).
There are many challenges to imagining this new service. How do ACLs and schema relate? Hierarchical and graph oriented stores seem natural, but can this work with document, column or relational stores? Security and privacy come at performance or scaling costs, is it worth it? For which types of systems?
RiverBear is just a thought experiment at this point… what are your ideas? Feedback?