41 articles and counting

Demystifying BrowserID and Mozilla’s LDAP backed webapps

IT, Operations, Security and Webdev have prepared a BrowserID adoption plan for our LDAP directory. Questions come up frequently, so I wanted to capture the basics of the plan.

In terms of BrowserID adoption, there are three classes of Mozilla websites.

  1. Many webapps that use MoCo’s LDAP instance for authentication and group permissions
  2. Mozillians.org which uses a new community LDAP instance for auth and groups
  3. Many more webapps that manage their own authentication (via MySQL or whatever)

Number 3 has nothing to do with LDAP, so we can go dogfood BrowserID on a case by case basis.

Classes 1 and 2 depend on LDAP and what that means is fairly confusing, so I wanted to clarify our current plan and where we are at.

The Cast

We all know the the Frog Prince story.

BrowserID is a lonely Princess as well as a protocol that gives the person currently using your webapp a way to disclose a verified email address.

LDAP (aka slapd) is an unloved frog that moonlights as a backend database; it authenticates people and keeps track of who is in what group. It’s got some warts, but don’t be so superficial.

sasl-browserid is the magical golden ball. Huh? Ya, blame the Brother’s Grimm.

When our princess drops the golden ball into the frog’s pond, poof, the frog transmogrifies into the princess’ long sought after life-partner. And they live happily ever after…

Okay, sasl-browserid is more of a C plugin than a mystical sphere, but whatever. This plugin BrowserID enables LDAP.

Technical Details

This will be short and painful, I promise.

Before sasl-browserid

  • You enter your email address and password into a form
  • Middleware code does a simple bind with a shared account
  • Code does a search filtering by your email address to find your DN
  • Code does a second simple bind using your DN and password, attempting to authenticate on your behalf:
    • You are a known user with a good password
    • Or not so much
  • More stuff happens.

simple bind, search, filter and DN are LDAP terms.

After sasl-browserid

  • You click a link or button and do the BrowserID dance. Frontend code gets a BrowserID assertion and sends it to middleware.
  • Middleware code does a sasl interactive bind with your assertion and audience
  • Code uses ldap’s whoami* and checks the output
    • You are a known user in the system
    • You are an unknown user in the system
  • More stuff happens

sasl interactive bind and whoami are also LDAP terms.

If that didn’t make your eyes glaze over, there is more gnarly details and an architectural diagram here.

What changed between before and after BrowserID? The authentication mechanism. What usually doesn’t change? The authorization and business logic of an application. Both sections have a "More stuff happens" section which means the goodness your app does and the permissions your users have, usually just continue to work.

We do not have to re-write our internal apps to take advantage of BrowserID.

Deployment Plan

sasl-browserid is getting a thorough review from the platform security team. In particular, David Chan has been doing a kick-ass job of finding flaws and ways to improve it.

Once it is ready, we’ll deploy it to development stage servers (next week or two).

But which app should we start with?

Our MoCo LDAP powers many, many Mozilla tools. This is beyond just web applications and bleeds into Desktop, command-line, and other uses; Reed’s biological functions are literally regulated by MoCo LDAP. It’s got both critical and sensitive information, so we have to be careful, but luckily we’ve also resently launched…

Mozillians.org also has a LDAP server backend. It has a lower risk profile, so we’ll deploy to it first. It’s lower risk for a bunch of reasons, including:

  • Smaller, newer, fresher
  • If it falls over, it won’t block Firefox development or othe critical Mozilla processes
  • It has a dedicated team to monitor, diagnose, and fix any issues sasl-browserid introduces

The current plan is to BrowserID enable mozillians.org in it’s 1.2 release (3-5 weeks out). This will give us experience with sasl-browserid. Once it has proven itself production ready…

We’ll add the plugin to MoCo LDAP. We’ll update the internal phonebook’s PHP code. We’ll test, deploy, wait and see.

If that goes well, then all the other LDAP backed webapps can follow.

Summary

and gentleman… I use the acronym LDAP with all due respect for the Fear and Loathing I’m sure it provokes in each and every one of you…

We’ve made an organizational commitment (training, hiring, etc) to improving our ability to maintain our aging LDAP infrastructure. Jabba and Corey have done a top-notch job of revamping the MoCo LDAP, so it’s a sustainable sub-system.

We can adopt BrowserID without throwing away MoCo’s LDAP backend by utilizing sasl-browserid.

Questions/Answers

  • This will not eliminate passwords from LDAP. Someone figure that part out, please.
  • This plugin doesn’t automagically convert your webapp to BrowserID. Some additional coding is required
  • Technically, this is not Single Sign On, but has some of the same useful properties
  • Yes, I’m happy to help you design/setup/whatever your BrowserID solution. Ping me in #identity.
  • Yes, this plugin might be reusable by large organizations such as schools, businesses, and pillow fight flash mobs. Please spread the word.
  • Technically sasl-browserid should work with any Cyrus SASL enabled client or server. OpenLDAP and Postfix are two known servers, are there more? Clients?

Let’s figure out what works well and where we can improve the BrowserID project. If we can do LDAP, we can do anything ;)

* Some programming languages lack whoami bindings, search can be used to emulate this check.

A Dart Daydream

Since Google released the Dart spec today, I wanted to share my speculation on the reason Dart exists: Dart is a new technology to solve a cross-organization engineer people problem, internal to Google.

I have no inside knowledge, but reading between the lines, Google had several sets of engineering efficiency problems.

This is not surprising. Google is a massive, incredible engineering organization. They are creating cutting edge solutions to some of the world’s hardest problems. I would expect them to have huge people challenges;
sudo herd cats still doesn’t work.

Classic engineering problems

  • Code reuse
  • Cost of shipping a project, etc.
  • Communication
  • Standards (Code Quality, Security, etc)

A classic mistake is to solve these things with technology first, before solving the actual people-oriented problems. Technology can help enforce standards and make code reuse easier, but tech alone will fail.

Imagined Problems

As I day dream, I tried to imagine problems that could result in Dart as a (flawed) solution…

Perhaps Google spun up 6 teams to investigate and solve 6 problems (Why isn’t project X using GWT?, Why did project Y fail? Why did project Z take 6 extra months and 30% more $$$ to ship?).

Being an awesome company, these postmortems may have bubbled up into a more core question: How can we solve this once and for all?

One of many possible analysis:

An example of flawed logic that could lead to project Dart:

  • Writing webapps on GWT is better (for our Java developers) than hacking together Javascript, so
  • All our teams should use GWT instead of JavaScript, but
  • We couldn’t get everyone on board with GWT, so
  • If Google could replace JavaScript with a new clean language, we wouldn’t have these engineering team problems, and it turns out
  • We employee some key language pioneers in enterprise languages, so
  • Google needs a ‘programming in the large’ friendly platform like Java was for the last decade.

The end goal is enticing

  • Lower power usage and data center costs, be greener
  • Ramp up new engineers quickly
  • All projects reusing standard language and APIs
  • Lower cost of engineers switching teams
  • Avoid legally tainted platforms (Java)
  • Provide puppies and rainbows for all

Some of these are real issues. Some of these are nice to haves. Just technology isn’t the right solution to all of these problems.

Perspective

Some, definitely not all, but some people at Google have this perspective:
Google hires all the smartest people of the world. We will figure out how to solve the world’s problems; you will live in our resulting products. You’re welcome.

Sometimes being the smartest person can blind oneself to the actual root problem. Having an engineering bent causes one to look at fixing tools instead of process.

Solution

Google has two tactical thrusts, to continue investing in the next generation of web standards (Yay!) and to replace JavaScript with Dart for all internal mobile and desktop app development (Yawn!).

Engineer problems like the ones I invented could have caused this new tactical decision; engineering process problems are hard to solve in any large organizations. JavaScript is a detail, not the cause of many of these types of problems.

In 5 years, various hypothetical teams across my imaginary Google will have drifted back into a Dart-based Babylon, if hypothetical Google doesn’t fix the core communication and standardization issues which may have prompted Dart’s creation.

It is possible to address root causes directly – Google’s Java programmers want GWT, but Google’s native web programmers want JavaScript… Make every employee use GWT; this fits with the only C, Go, Python, or Java languages stance. Introduce training to convert web programmers into Java oriented programmers as part of engineer on-boarding.

These are not fun decisions to make or communicate, but that is the point of project and engineering management.

Creating Dart is expensive (language, tools, runtime, etc), but may give them the social framework to communicate people oriented engineering changes (employees will all use Dart) that address the real core issues. So ultimately Dart might not be a loss for Google internally.

Dart alone will do nothing to solve the root problems that prompted Dart.

Day Dream

Again, this post is based on a hypothetical situation internal to Google. I’ve pieced together this day dream from my own “Enterprise” experiences, various Google posts over the years, etc to resolve why Google would create the Dart platform.

I hope I haven’t offended any of my friends at Google and my speculation is probably wildly inaccurate. If nothing else this post is a parable to talk about solving people problems directly before building tools.

But the real world announcement has some real world impact…

Collateral Damage from the Dart Launch

An unintended consequence of the Dart launch is that it may weaken Google’s efficiency in promoting web standards. It is a very confusing story to explain how promoting the Dart platform and working towards next generation, open web standards are not conflicting Google projects.

Example Dart versus CoffeeScript:

Early adopters in the webdev community don’t see Dart as more attractive than CoffeeScript which they see as having a superior syntax to Dart. Since both must compile down to JS, they are equally viable for experimenting with language semantics. So on first blush, Dart is a meh to CoffeeScript commentators on Hacker News and Reddit.

Aspects of CoffeeScript may inform the next generation of JavaScript standards. CoffeeScript comes from the community organically, not from ‘the one offical smart guy in the room’.

I’m probably wrong, as I don’t have much insight into Google’s engineering organizations, but it is quite possible that Dart is a facile throw at the wrong target.

3 Years with Mozilla

It was 3rd years ago today that morgamic and bretr foolishly hired allowed me to join Mozilla.

But what have you done for me lately?

Here is my upcoming lightening talk for “webengagment”. It covers what I’m working on (and with whom):

Click to focus, Right Arrow to switch slides.

36 months, and I feel like I am just getting started! Onward.

A Sketch of RiverBear

OpenLDAP gets a lot of things right. It gives developers the opportunity to make really progressive designs which respect a user’s privacy and provides good security.

Unfortunately it also gets many things wrong, when viewed from a web applications perspective.

Three core issues:

  • Long lived connections
  • Antiquated schema design
  • monolithic architecture

Let’s just focus on the last item, the monolithic architecture.

Here is roughly slapd’s architecture.

Looking at search – it’s fantastic that it provides a search mechanism, but how can you get funky with the search engine? Learn C! Fork OpenLDAP :)

What subsystems are tightly coupled to search?

  • Schema design
  • ACL
  • Data Storage
  • LDAP protocol

How do you switch search engines? You can’t… That is the trouble with monolithic designs.

What we want is a system filled with loosely coupled components. Use native search one day, but swap it out for Elastic Search the next. We want a data service which can be scaled across many different servers, so I can run a search cluster on some nodes, the data storage layer on some nodes, etc.

There is a very good reason that OpenLDAP is monolithic… ACL are deeply baked in and affect most other subsystems. When you search ‘cute kittens’, you don’t just get everything. You get results that contain ‘cute kittens’, but subject to the currently authenticated user and their trusted roles in the system. You may get none, a subset, or all of the records relating to these adorable creatures.

Most modern web applications use only one or perhaps a few credentials to access all data and then trust the business logic to enforce privacy and security constraints. How many systems have you worked on that had only web, admin, and replication accounts for mysql?

What if we had a web oriented data store that respected user privacy as deeply as possible? Your web application would be just another user agent.

It would be an interesting exercise to explode this architecture into contemporary standards, protocols, and established products.

A hypothetical architecture, let’s call it RiverBear:

In this diagram we see authentication via BrowserID and authorization via a new service, say RiverBearGroups (ACL as webservice). This service returns portable data such as jpeg, portable contacts, html, JSON, etc. Iterating a collection of data could be done via ActivityStreams.

We see RiverBear coordinating these stand bits of data via Search (ElasticSearch, Sphinx, whatever), file systems, and backend datastores like Drizzle/MySQL/CouchDB/MongoDB or whatever.

To improve performance and deployment flexibility, a RiverBear plugin could later be added (to subsystems with plugin architectures) which would filter out data at the lowest layer possible. Example: RiverBear-Drizzle would use your current authentication and authorization to throw any unauthorized data or raise errors for modifications. Regardless, this would be done on the frontend, so without the plugin it is secure (but less efficient).

There are many challenges to imagining this new service. How do ACLs and schema relate? Hierarchical and graph oriented stores seem natural, but can this work with document, column or relational stores? Security and privacy come at performance or scaling costs, is it worth it? For which types of systems?

RiverBear is just a thought experiment at this point… what are your ideas? Feedback?

LDAP for MySQL geeks

Another way to approach learning OpenLDAP is through the lens of RDBMS. As a MySQL user… here are a few slapd equivalents.

Authentication

Some tools (like slapadd against the config directory) are used via authentication provided by your Operating System (Ubuntu for me, which I think is an odd duck for OpenLDAP).

OpenLDAP’s documentation is leary of Linux distributions. I can see why in that Ubuntu has some quirks. You use sudo and let the OS do authentication. It also keeps most of the config in an LDAP directory, instead of a config file. Whatever.

Other tools will be used with the “directory manager” username and password. There is often a special rootdn user setup. rootdn is magical and outside of ACL. Kind of like the root user of MySQL I guess.

The rootdn and the rootpw are setup specially. You can change these by changing the config and restarting the server. All other accounts are setup over LDAP. Again, like MySQL, but takes a while to figure out.

Other tools, say ldapsearch, will be used as a user within the directory. These users are subject to Access Control Lists (ACL). Speaking of ACLs…

GRANT ALL ON db1.* TO ‘homeslice’@'localhost’

ACLs are like MySQL GRANT… on 5-hour energy drink!.

There is a very rich DSL which lets you specify who has access to what. Access is very fine grained. I hope to cover this in another blog post.

Primary Keys

Remember those gnarly-ass distinguishedNames (dn)? Think of these as primary keys.

This was a stumbling block, but the dn, base dn, and default scope are actually cool features.

When I first learned RDBMS systems and tried my hand at data modeling, the natural inclination was to use composite natural keys to make a primary key. In practice you often use artificial keys.

In a way, directory services went for composite natural primary keys, which is situated hierarchically. This is actually way more humane.

So a newbie in RDBMS will look for two probably unique attributes and combine them… I’ll do John Smith and concatenate their phone numer, that should cover it. id=Fireman_JohnSmith

Well… this is how LDAP roles… DB == id

US
  WA
    Seattle
        Fireman
            John Smith

This example is contrived… but you get the picture.

Seeing this id looks gnarly, but makes a lot of sense.
dn: cn=John Smith, oc=Fireman, l=Seattle, s=WA, co=US

It’s a concatenation of how you get to that element in a hierarchical db.

You can use an arbitrary incrementing id. You could use uid=111 isntead of cn=John Smith. Contrived example above just shows how natural keys are possible.

mysqldump

slapcat is like mysqldump. It is low level and operates on the data directly. You can safely use this while slapd is online.

You can also craft ldapsearch queries to dump data, but this is slower and less complete.

How do I bulk load data?

LDIF is a data serialization format used throughout the command line tools. It is a bit like JSON or using CSV dumps from MYSQL… pretty cool..

ldapadd or slapadd can be used to bulk load LDIF data. Slapadd is faster and operates directly on the datastore, but you must stop your server. Ldapadd goes over the LDAP protocol and is safer, but slower.

Writing LDIF by hand? Beware – the LDIF parser (or standard?) totally blows. Whitespace is significant. Pythonistas rejoice, but the rules are actually unexpected. I mumble and make sure to put whitespace very carefully.

InnoDB

Just as you can configure the backend store of MySQL, in slapd the backend is configurable. Typically data is stored in multiple Berkley DBs. There are bdb or hdb flavors.

Hosting

A single MySQL install can host multiple databases. A single LDAP directory server can store multiple directory trees.

Schema

However, schema information is global and bleeds into different directory trees. This seems like a pre-web version of distributed systems. Ouch.

In RDBMS systems, when you do DDL actions they are sandboxed to the current database. This is not the case with OpenLDAP. You define a foo attribute, it is global. You define a bar objectclass in directory A, yep… it bleeds into directory B.

So this can be confusing for writing installation instruction. This is not very agile nor sane. It’s like Dewey Decimal made it into the information age.

Photo of shipping containers

Oh… it gets even better… Welcome to Terry Gilliam’s Brazil

Would you like to create a new objectclass or attributetype? You’ll need to register an OID with the central authorities. Please mail one SASE to … As we learned in Everything Is Miscellaneous, this is horribly antiquated.

Your OID, which must be globally unique, will serve as your base OID and you’ll add more numbers to it to get a globally unique object identifier per attributetype or objectclass.

You don’t have to look at OIDs very often. You can alias them to friendly names. But be aware of them.

I recommend hijacking other’s OID for fun and profit ;) Works for Twitter handles and domain names right?

Foreign Keys

MySQL has foreign key references. You can do the same thing by using attributes which are distinguishedName references.

You can use a dn for a value. Common attributetypes for this are the seealso or member attributes. This is super cool and like a foreign key or symbolic link.

By default LDAP doesn’t enforce referential integrity.

  • You can add a dn that doesn’t exist
  • Deleting a record doesn’t purge dn references

There is a RefInt overlay available for providing referential integrity. Overlays are like extensions and there are several available to add services in a performance sensitive manner.

People are pretty comfortable enforcing referential integrity in the application or another layer these days, so it’s all good.

Schema Migration

A pain point with RDBMS and web applications is schema evolution. Rolling out schema migrations to big data systems is a PITA. NoSQL databases are a current topic for many reasons, but this is one of the drivers.

An LDAP Directory’s schema is even more rigid than RDBMS. Reading the literature, once gets a sense that you should design it right the first time. In practice, it’s not that type of party.

Wanna change something? I haven’t found an easy to use DDL. You have to use ldapmodify and a DSL to remove attributetypes then readd them, etc. Remember, this affects every directory running under slapd. I also got a lot of errors, but maybe I fat fingered something.

I imagine it is possible that using good Emergent Design methodology and auxiliary types might combat this issue. Following the open/closed principal and such. Good luck with that one on real world teams :)

I’d advocate keeping the LDAP layer as thin as possible and using it only when appropriate. Data storage can be augmented with web services, RDBMS, and NoSQL backends.

Brutal Workflow

Please let me know of something that works better on Ubuntu’s OpenLDAP, but here is what I do to rapidly iterate a schema design:

The key is nuking all OpenLDAP config files as well as the low level bdb files.

DB client

I use something like DBVisualizer when working on relational databases. The equivalent is Apache Directory Studio. This app is great for poking around and learning LDAP concepts. It’s easier to use once you understand how directories work.

Conclusion

That should be enough to get the MySQL geek going on next steps with slapd.

LDAP Object Oriented Programmers

As I’m learning OpenLDAP, it seems useful to draw on programming language concepts to understand the system.

Data from (and the schema behind) a LDAP directory is object oriented and seems to have less of an impedance mismatch between code and data, than the more popular RDBMS.

Structure

Schema definition appears to be highly inspired by object oriented inheritance, a popular programming paradigm. Every record has a structural objectclass. Classes come in three flavors:

  • Abstract
  • Structural
  • Auxiliary

These notions are quite familiar to OO programers. Structural classes are the work horses of Schema definition and are basically what most languages prove as a class. Abstract are like abstract base classes in Java. Auxiliary are like interfaces in Java or mixins in Ruby.

Let’s look at an example Schema – an objectclass for countries

objectclass ( 2.5.6.2 NAME 'country'
        DESC 'RFC2256: a country'
        SUP top STRUCTURAL
        MUST c
        MAY ( searchGuide $ description ) )


A record cannot be created for an abstract objectclass. All records must have one or more structural classes. The root of the LDAP Schema inheritance hierarchy comes from top which has one mandatory attribute objectclass.

An objectclass can use the SUP keyword to inherit from another objectclass. The default value for SUP is top. Objectclasses declare what attributes MUST and MAY appear in a record. A record can’t have arbitrary elements, the field must have been declared in the schema.


Multiple Inheritance diagram.

It is illegal to do multiple inheritance where a record has two
structural object classes that diverge, such as D and E in the diagram
above.

A record can have multiple objectclasses, so a record that declares A, B, and D is fine.

Records may freely mix in auxiliary objectclasses to pick up new well understood properties on existing structural classes. So this is a multiple inheritance like pattern.

To figure out which is the most concrete class, the system looks for the structural objectclass furthest from top.

In addition to objectclasses being hierarchical, the attributes on an objectclass are also hierarchical. The mechanism is also the SUP keyword.

Again like objectclass, if no parent for an attributetype is defined, it is top by default. You may always add top to be explicit.

Let’s see an example of attributetype schema:

attributetype ( 2.5.4.41 NAME 'name'
       EQUALITY caseIgnoreMatch
       SUBSTR caseIgnoreSubstringsMatch
       SYNTAX 1.3.6.1.4.1.1466.115.121.1.15{32768} )

Ignore 1.3.6.1.4.1.1466.115.121.1.15, it is the object id (OID) for a utf-8 string. We’ll cover OID in the next (MySQL oriented) blog post.

The SYNTAX of an attribute specifies what kind of data can be stored in the field. UTF-8 strings, phone numbers, JPEG binary blobs, etc. These values can also be given a max length via {n} syntax.

Continuing looking at how to model countries, let’s look at the definition of ‘c’ that the earlier objectclass declared as a MUST have attribute.

attributetype ( 2.5.4.6 NAME ( 'c' 'countryName' )
        DESC 'RFC2256: ISO-3166 country 2-letter code'
        SUP name SINGLE-VALUE )

Pretty straightforward. We inherit from ‘name’. Notice the SINGLE-VALUE keyword. An attribute can be a list of values, unless you constrain it to a single value. This seems like a pretty elegant solution when modeling data.

Looking at all the attributetypes and objectclasses is a bit like looking at a Django project’s models.py.

So these schemas capture the static aspects of a directory.

At runtime, the records in the system also exhibit Object Oriented behavior.

Behavior

Several declarations of an attribute control how search works. Equality and substring matching are controlled declaratively.

As noted this is kind of bizarre, but a LDAP Directory provides search, so there you go. If your blogTitle should be searchable just like a person’s Given name, then you can look through the schema and make it inherit from name or use the same values for the matchingRules property.

These Attributes settings can be inherited and many attributes in the base schemas inherit from name.

A directory’s layout is hierarchical, but doesn’t have to follow the same structure as the objectclass structure. Records are grouped under grouping objectclasses. Data is organized with regard to data access patterns and how ACL will be written.

You can think of an LDAP directory as a web service that returns LDIF instead of JSON. Your code runs in the application or middleware layer, which you have full control over. The LDAP protocol provides a sophisticated protocol for reading, writing, and searching objects.

Here is some example LDIF for which captures a country:

dn: c=United States,dc=example,dc=com
c: United States
description: The marshmallow filling between Mexico and Canada.


The dn is the only funky bit here, otherwise this looks like an easy format, perhaps a cousin of JSON.

As noted, a record may have multiple object classes and can be much richer than this small snippet.

Summary

Hey OOAD fans, directories are inherently OO! An object class is part of every record. This schema information says what can, must, and can’t be in a record’s attributes. Also… a record can have multiple object classes… Hello mixins! A record must have one super class which is called the structural object class, because that sounds tough.

Hopefully this gives us another perspective for starting to understand LDAP directories.

Why LDAP isn’t more widely adopted… or WTF?!?

Here are some perceived problems and recommendations. My recommendations are coming from a total newbie, so they are fairly worthless.

This post is about barriers to adoption and will be the most negative of this series. I will be dropping some fbombs.

Let’s start with some nice things… we’ll get to the constructive feedback soon enough :)

LDAP is a protocol, not an API. This is actually really cool and very in-fashion.

LDAP directories are distributed (in more than one way), sweet.

Another cool thing about LDAP is that records can have some flexibility to their schema, but they still have a schema. You can augment a record with an “auxiliary” type. Types can have mandatory and optional elements. Ad-hoc elements aren’t allowed, they must be in a schema. This seems like a sweet spot for some classes of data storage solutions.

For such an ancient and storied beast, OpenLDAP gets many things right. If you were to re-launch it as a new NoSQL project, written in NodeJS, you’d probably get some love. Developer’s love the new hawtness.

Problem: Bizarre keywords

Technically these aren’t keywords, but to a programmer just encountering LDAP, that is what they seem like.

Attributes are standardized and have inconsistent and wacky names. As a programmer, I’m used to creating my universe on top of a standard library. A programmer approaching data and data definitions… I make that up as I go, right?

dn

What the fuck is a dn. Oh it’s a distinguished name. Well… how … distinguished.

A DN is like an ID or a KEY. People know those terms. It’s okay if it isn’t technically accurate. Humans remember a good story over a technically accurate story.

dn = distinguishedName… Okay, so everything is supper abbreviated, I can dig it… Let’s see O=Acme Services, telephoneNumber=+1 800 555 9834, postalAddress= 98177, l=Seattle…

What, what? Inconsistent much? So it appears that objectclasses and attributetypes can have multiple aliases.

o is an alias for OrganizationName and l for localityName.

Tweet you’ll never see:

OH: Hey @Jill, I forgot, which localityName do you live in again? #onthebus

This causes my WTF/minute to spike. I imagine one of the following created these names:

  • Regional oddity
  • Authors crashed on earth in space pod
  • The authors took a break from speaking Esperanto long enough to form a quorum and vote in the schema
  • Maybe that schema was baked in a BOF at DirectoryCon in Atlantis 1872

Who knows… once you grok the names and the semantics, it’s all good.

Mixed Case

Why do I see o and O used for attributetype names? Because it’s case-insensitive. Like Windows 3.1.
o_O

Recommendation:

Objectclasses and AttributeTypes may have multiple names. Always use the
most descriptive name or invent aliases that aren’t totally fucked. Don’t use more than one alias. I know naming things is hard.

Problem: Discoverability

It’s very hard to ‘view source’ on an existing app and understand WTF is a dn… These things are buried deep in the system in .schema or .ldif files. When you read them, they read like a Dewey Decimal’s Bureaucratic Vision of the New World Order.

Recommendation:

I haven’t found a really good document or manual of the existing standardized schemas. One that makes it easy to see data types, reasons they exist, etc. I have to resort to greping the schema files or browsing via Apache Directory Studio.

Problem: LDAP Directories do too much

I can sympathize with the successive generations of decisions that went into creating LDAPv3.
In system design there are lots of hard decisions around when to separate policy (application level minutia) from application (or service). What do you

  • Bake in?
  • Make Controllable programatically?
  • Make Configurable?

LDAP makes very different design decisions (than say a SQL92 DB) in terms of what is provided via the LDAP protocol vs what is Application level or what should be provided by other tools.

An Example is Search.

LDAP schemas encode what is a valid search strategy, instead of delegating this
to a search layer outside of the LDAP Schema (or at outside of LDAP entirely)…

Oh my. Wanna swap out Sphinx for elasticsearch? It’s not that type of party.

So your schema will encode that you can search with ‘o*’ against country fields and get back a boat-load of countries. You might expect to handle this at the application layer. So if you don’t want to allow wild card characters, or you want to make sure that search results are conservative, you’ll need to remove ‘*’ from search input or post-process the search results.

Recommendation:

We haven’t talked about ACL. This explains why search is tightly coupled.

I punt. Software is hard.

Problem: The CRM/CMS/Groupware problem

I think it’s incredibly heard to create large, general, reusable applications. Look at Sharepoint, Plone, Netscape Suite, etc.

Really good software is small and tailored exactly to the organization or social function it serves. Building really good general purpose software is usually impossible. Hats off to (any) level of success here.

Recommendation:

I punt. Software is hard.

Problem: Schemas and Data

Snippet of LDIF (serialized data):

dn: uid=john,ou=people,dc=example,dc=com
objectClass: inetOrgPerson
objectClass: posixAccount
objectClass: shadowAccount
uid: john
sn: Doe
givenName: John
cn: John Doe
displayName: John Doe
uidNumber: 1000
gidNumber: 10000
gecos: John Doe


Wow. Really?

The idea that LDAP v3 specifies schema is odd and surprising. Maybe the protocol doesn’t (I haven’t read the RFCs), but it’s deeply ingrained in the culture that you use at least the 4 most popular schemas out of the box.

There is an amazing amount of redundant information. Looking at ldif files, they violate the DRY principal. Tooling can fix the DRY issue.

But an inetOrgPerson out of the box? It’s a bit akin to a programming language providing a Person class in it’s standard library. How’s that working out for you?

No respectable RDBMS comes with a predefined schema. I’d giggle if MySQL offered inetOrgPerson tables with organizationalName or localityName columns. Again this is the “too many layers to understand at once” problem.

Related Problem:

If I start whipping up my own schema… I quickly run into the crazy constraints that every schema on the planet is in the same namespace. W… T…F… My slapd server can host multiple directories, but it can’t have attribute name clashes between my different projects. What the What? (We’ll cover OID in another post).

LDAP and x.500 were designed many moons ago. This is an inferior strategy for distributed system design.

Recommendation:

Schema should be namespaced and more cleanly separated from the daemon that provides LDAP Directory services. I doubt a LDAPv4 could add this w/o breaking backwards compatibility. Java’s package namespacing conventions have done good work here.

Problem: Unexcepted data and programming models

The great thing about learning new systems or languages is that it stretches your mind. Having casual contact with LDAP, there is a lot of unexpected idioms.

  • Search check to see if something exists, doesn’t always return a value
  • Advanced and nuanced binding is baked in, uses unusual terminology
  • Type systems are hierarchical, data is hierarchical, the two don’t have to be related

The way data is modeled and organized is very different than RDBMS.

To this last point, exploring the data sets in LDAP directories can be a much more natural experience than wading through tables in a RDBMS.

LDAP Directories are fairly sophisticated. This means quite a bit of training is required before someone feels comfortable using or tweaking things. Ruby has a concept of POLA. For whatever reason, LDAP has a lot of surprises.


There are several great books and free materials out there on OpenLDAP, so I don’t think this is a deal-breaker, it just reduces casual adoption.

So there are some initial thoughts from a web developer trying to understand LDAP. LDAP systems are widely deployed in IT Departments, but never a first tool to reach for in the startup and Internet programming worlds.

These assumption and confusions hopefully show some of the disconnects for potential adoption by new users. Seeing code that uses LDAP or seeing snippets of schemas or ldif files can make a programmer feel uneasy. It seems like foreign territory.

Having read (the two resources above) and implemented a toy project (a recipe directory), OpenLDAP doesn’t seem that crazy or foreign. It’s got a few warts and scars like any real world, successful technology.

Instead of just jovial complaining, my next two posts will help MySQL oriented users as well as OO programming oriented users get started with OpenLDAP concepts and tools.

OpenLDAP Notes

I’m going to write a series of short blog posts about my nascent experiences with LDAP directories.

I hope to cover the good, the bad, and the fugly of this beast.

Planned posts will include:

  • Why LDAP isn’t more widely adopted or WTF?!?
  • LDAP for MySQL peeps
  • LDAP for OO programmers
  • Care and feeding for the total newb (tips)
  • Directory data modeling example

Why Study OpenLDAP?

I’m helping organize the Mozillians.org project. To enable adv privacy experiments we’re using OpenLDAP as the initial backend.

I need to get my head around it enough to be able to shoot myself in the foot :)

I’ve always wanted to study a hierarchical DB, beyond file systems and other common data stores. I didn’t really find one… until LDAP.

There are four main types of databases:

  • Relational
  • Hierarchical
  • Graph
  • Document

Web developers tend to focus on RDBMS and Document (NoSQL) databases. RDBMS are so ingrained in us, that we’ve standardize many webapp frameworks on top of the ActiveRecord pattern.

Lots of creative energy is being put into new database (CouchDB, MongoDB, etc) or data structure servers (Redis) that push and remix the ideas of these four paradigms.

OpenLDAP is most closely aligns with the hierarchical flavor of a database management system. I really enjoy studying systems that have stood the test of time. You can learn a lot by examining the strength and flaws.

Many of Mozilla’s core developer webtools integrate with our current existing LDAP instance.

There are many large OpenLDAP installations in the wild. It’s ancient, robust, and optimized for certain classes of problems.

I’m going to say lots of positive and negative things about LDAP. These are my observations and aren’t terribly clueful or empirical, so please educate me. My goal isn’t to flame the OpenLDAP community, but to give honest insight into the beginner’s mind.

So… onwards!

Mozilla Beta – Innovating in the Community while Protecting Production

TDLR; We need a Beta label for community websites. I wanted to capture these personal thoughts, which I’ve had while acting as a Webdev Steward on various projects including the contribute page, ReMo, etc.

Mozilla Websites have goals:

  • Protect our millions of web users
  • Create innovating software and processes to engage our community and the world
  • Exemplify World Class execution

As Community members, we have goals:

  • Solve chronic problems
  • Create useful tools
  • Innovate at break-neck speed.

So our end goals can sometimes conflict with our immediate goals. I’ve learned through personal experience being both an employee and a community member that:

It’s a waste of time to innovate through production channels.

That is okay, there is a solution…

Mozilla is great at creating Sandboxes which are safe to innovate in. Think about Add-ons. You can completely change Firefox and share this with others. Many ideas discovered and refined in Add-Ons get productized into Firefox.

Mozilla is a web innovation party and
everyone is invited!

I think we need another sandbox at the web application level. This isn’t a technical sandbox, but a cultural and process sandbox.

Build “Mozilla Beta” websites

Innovate, measure, tweak, and refine at break-neck speed. Do this on a domain name you control and deploy.

Cobble together solutions using unreliable 3rd party services, Perl, duct-tape, and cut corners. It’s a prototype and it’s part of Web Science.

Once your product is tight, you can see successful KPI, and the community love it… then pay the Productionization Tax and make it a sustainable Mozilla Corporation project.

Productionization Tax

A lot of blood, sweat, and tears go into an offically supported web application at Mozilla.

You have to write your code to certain standards on Playdoh using MySQL. Your frontend code has to be cutting-edge-industry-awesome-sauce. You have to communicate your feature-set to WebQA. You need to review their test plans. You need to give Operations a heads up on infrastructure needs. Deployment requires detailed instructions including every config value. You can’t try out arbitrary services like MongoDB without working with IT closely weeks ahead of time. Maybe MongoDB didn’t work out… Will you throw it away after the cumulative days or weeks of time you’ve invested political, process, and hardware for it? You have to ask for resources from all these other teams and mate your schedule with theirs. Did I mention legal and privacy vetting? Ack!

This is reality. This is how we meet our goals of protecting our users. This is how we maintain high quality products like AMO.

The Beta Label

We need a label for projects in their non-production state. Some ideas

  • Beta
  • Alpha
  • Community
  • Incubator
  • Labs
  • Drumbeat

One meaning of “Mozilla Community” is exactly for this purpose. Unfortunately, as a sandbox it needs branding work. Basic issues is that it’s a long tagline and has overloaded meanings. Maybe we should stick with Mozilla Community Website, as there has already been some design work and many Beta projects alreday use this label. If so, we need to continue to popularize the sandbox aspect of the Community label.

Labs is already used by the Mozilla Labs organization.

I like Beta since people are familiar with it. Alpha is also a good candidate since “Beta” is often tied to a production system (like Gmail). Beta often live on a sub-domain of the production domain and this is not the case here.

Not sure how Drumbeat (c|w)ould work or if it would be bad to overload the term.

What do you think? I think there is a better label than Community or Beta….

Betas Case Study: Labs

Mozilla Labs is a de-facto sandbox.

This tension between beta and production is something Mozilla Labs has to work within. Ubiquity and Weave were beta projects in the sense that they are done outside of our production channels. Labs works hard to establish the right credibility expectations:

This is incomplete, buggy, and not production scale. Please join us in building the future. — roughly Labs Social Contract

The labs user community understand they may lose data and that they are using a Labs product. APIs might change over-night. Features are added and tossed with reckless abandon and this is good.

Eventually Ubiquity and Weave were productized into Jetpack and Firefox Sync. They are now a responsibility of official channels such as the Mozilla Services group. User data is sacred, good security, high quality code and processes are a must.

Betas Case Study: mozdev.org

David Boswell explained that mozdev.org existed before AMO and is the seed for lots of productized initiatives.

Being Beta

This sandbox is not a silver bullet. Going from Beta to production might result in:

  • Rewriting your product
  • Having your users create a new account on a different domain
  • Hair Loss

I think the Beta tax is much smaller than the Productization Tax and that you can get further, faster in Beta.

Myth?

My project can’t be successful without being on a mozilla.org domain.
To me the quality of your goals, imagery, UX, and copy will make your project official. It’s about communication, execution, and usage. The domain doesn’t matter. Look at thes projects, all thoroughly Mozilla:

Your Thoughts

Organizationaly, would a Mozilla Beta Website sandbox be a good tool? What should we call it?

Introducing GSD – Getting Shit Done!

I’ve created a simple TODO Web Application that is compatible with the Getting Things Done system.

It should work on a modern phone, tablet or desktop.

Leading up to the Firefox 4 launch, I’ve kept myself from working on any side projects. That said, sometimes I need to hack on something to put myself to sleep. Ugh oh, somehow a new side-project was born.

I wanted to learn Responsive Design, the IndexedDb API, and play with Mobile Web Applications… so I decided to write a little personal productivity app. I ended up using jQuery Mobile for the mobile and desktop UI. Sadly Fennec support for IndexedDb hasn’t landed, so you’ll have to use the stock browser on Android.

Notes:

  • One codebase for mobile or desktop
  • Fluid layouts
  • Two formats small and medium
  • Offline support, iOS homescreen integration, Chrome Webstore integration
  • All data is 100% client side – great privacy
  • No Sync :(

Try it out and let me know what you think!