65 articles and counting

A Sketch of RiverBear

OpenLDAP gets a lot of things right. It gives developers the opportunity to make really progressive designs which respect a user’s privacy and provides good security.

Unfortunately it also gets many things wrong, when viewed from a web applications perspective.

Three core issues:

  • Long lived connections
  • Antiquated schema design
  • monolithic architecture

Let’s just focus on the last item, the monolithic architecture.

Here is roughly slapd’s architecture.

Looking at search – it’s fantastic that it provides a search mechanism, but how can you get funky with the search engine? Learn C! Fork OpenLDAP :)

What subsystems are tightly coupled to search?

  • Schema design
  • ACL
  • Data Storage
  • LDAP protocol

How do you switch search engines? You can’t… That is the trouble with monolithic designs.

What we want is a system filled with loosely coupled components. Use native search one day, but swap it out for Elastic Search the next. We want a data service which can be scaled across many different servers, so I can run a search cluster on some nodes, the data storage layer on some nodes, etc.

There is a very good reason that OpenLDAP is monolithic… ACL are deeply baked in and affect most other subsystems. When you search ‘cute kittens’, you don’t just get everything. You get results that contain ‘cute kittens’, but subject to the currently authenticated user and their trusted roles in the system. You may get none, a subset, or all of the records relating to these adorable creatures.

Most modern web applications use only one or perhaps a few credentials to access all data and then trust the business logic to enforce privacy and security constraints. How many systems have you worked on that had only web, admin, and replication accounts for mysql?

What if we had a web oriented data store that respected user privacy as deeply as possible? Your web application would be just another user agent.

It would be an interesting exercise to explode this architecture into contemporary standards, protocols, and established products.

A hypothetical architecture, let’s call it RiverBear:

In this diagram we see authentication via BrowserID and authorization via a new service, say RiverBearGroups (ACL as webservice). This service returns portable data such as jpeg, portable contacts, html, JSON, etc. Iterating a collection of data could be done via ActivityStreams.

We see RiverBear coordinating these stand bits of data via Search (ElasticSearch, Sphinx, whatever), file systems, and backend datastores like Drizzle/MySQL/CouchDB/MongoDB or whatever.

To improve performance and deployment flexibility, a RiverBear plugin could later be added (to subsystems with plugin architectures) which would filter out data at the lowest layer possible. Example: RiverBear-Drizzle would use your current authentication and authorization to throw any unauthorized data or raise errors for modifications. Regardless, this would be done on the frontend, so without the plugin it is secure (but less efficient).

There are many challenges to imagining this new service. How do ACLs and schema relate? Hierarchical and graph oriented stores seem natural, but can this work with document, column or relational stores? Security and privacy come at performance or scaling costs, is it worth it? For which types of systems?

RiverBear is just a thought experiment at this point… what are your ideas? Feedback?

3 Responses to “A Sketch of RiverBear”

1
Ludovic - 26/08/11
Did you look at apache directory server ?
2
ozten - 26/08/11
I haven't looked at apache directory server. I'm thinking about a non-LDAP protocol. No persistent connections. No LDIF. No LDAP Schema.

Besides the protocol, is directory server more flexible and lest monolithic than OpenLDAP?
3
Daniel Einspanjer - 26/08/11
A lot of what you are talking about here is what the Metrics team is delivering with our Bagheera service. A distributed and flexible transport layer that provides a convenient place to inject backend business logic where necessary and decoupled from the various data stores that we want to use.

The big piece we are missing is authentication and authorization. I've thought about whether we could integrate LDAP to provide those two features or whether we would need something else entirely.

I'd be very interested in hearing ideas on how that could be roped in to bagheera.
https://github.com/mozilla-metrics/bagheera