Conceptual Data Modelling

The biggest area of risk on any Salesforce implementation project is the data model. In my view this assertion is beyond question. The object data structures and relationships underpin everything. Design mistakes made in the declarative configuration or indeed technical components such as errant Apex Triggers, poorly executed Visualforce pages etc. are typically isolated and therefore relatively straightforward to remediate. A flawed data model will impact on every aspect of the implementation from the presentation layer through to the physical implementation of data integration flows. This translates directly to build time, build cost and the total cost of ownership. It is therefore incredibly important that time is spent ensuring the data model is efficient in terms of normalisation, robust and fit for purpose; but also to ensure that LDV is considered, business critical KPIs can be delivered via the standard reporting tools and that a viable sharing model is possible. These latter characteristics relate to the physical model, meaning the translation of the logical model into the target physical environment, i.e. Salesforce (or perhaps database.com). Taking a step back, the definition of a data model should journey through three stages; conceptual, logical and physical design. In the majority case most projects jump straight into entity relationship modelling – a logical design technique. In extreme cases the starting point is the physical model where traditional data modelling practice is abandoned in favour of a risky incremental approach with objects being identified as they are encountered in the build process. In many cases starting with a logical model can work very well and enable a thorough understanding of the data to be developed, captured and communicated before the all important transition to the physical model. In other cases, particularly where there is high complexity or low understanding of the data structures, a preceding conceptual modelling exercise can help greatly in ensuring the validity and efficiency of the logical model. The remainder of this post outlines one useful technique in performing conceptual data modelling; Object Role Modelling (ORM).

I first started using ORM a few years back on Accounting related software development projects where the data requirements were emergent in nature and the project context was of significant complexity. There was also a need to communicate early forms of the data model in simple terms and show the systematic, fact-based nature of the model composition. The ORM conceptual data model delivered precisely this capability.

ORM – What is it?
Object Role modelling is a conceptual data modelling technique based on the definition of facts in the form of natural language and intuitive diagrams. ORM models are subject to rigorous data population checks, the addition of logical constraints and iterative improvement. A key concept of ORM is the Conceptual Schema Design Procedure (CSDP), a prescriptive 7 step approach to the application of ORM, i.e. the analysis and design of data. Once the conceptual model is complete and validated, a simple algorithm can be applied to produce a logical view, i.e. a set of normalised entities (ERD) that are guaranteed to be free of redundancy. This generation of a robust logical model directly from the conceptual schema is a key benefit of the ORM technique.

Whilst many of the underlying principles have existed in various forms since the 1970s, ORM as described here was first formalised by Dr. Terry Halpin in his PhD thesis in 1989. Since then a number of books and publications have followed by Dr. Halpin and other advocates. Interestingly, Microsoft made some investment in ORM in the early 2000’s with the implementation of ORM as part of the Visual Studio for Enterprise Architects (VSEA) product. VSEA offered tool support in the form of NORMA (Natural ORM Architect), a memorable acronym. International ORM workshops are held annually, the ORM2014 workshop takes place in Italy this month.

In terms of tools support ORM2 stencils are available for both Visio and Omnigraffle.

ORM Example
The technique is best described in the ORM whitepaper. I won’t attempt to replicate or paraphrase this content, instead, a very basic illustrative model is provided to give nothing more than a sense of how a conceptual model appears.

ORM2 basic example

Final Thoughts
In most cases a conceptual data model can be an unnecessary overhead, however where data requirements are emergent or sufficiently complex to warrant a distinct analysis and design process, the application of object role modelling can be highly beneficial. Understanding the potential of such techniques I think is perhaps the most important aspect, a good practitioner should have a broad range of modelling techniques to call upon.

References
Official ORM Site
ORM2 Whitepaper
ORM2 Graphical Notation
Omnigraffle stencil on Graffletopia

Custom Settings Naming Conventions

A quick post to share some thoughts on the standardisation of naming conventions applied to Custom Settings. With Custom Objects it is an obvious best practice to mirror exactly the conventions applied by the Standard Objects, there are no reasons not to adhere to this approach, none. With Custom Settings however there is no comparable reference and as such a great deal of variation exists in the naming conventions applied. For many, this matters little, however I’d make the usual arguments about maintenance, readability, build quality through predictable convention and so forth. There’s also merit in clearly distinguishing Hierarchy from List types and ensuring a meaningful naming for the latter, which can be utilised across many declarative build elements.

I use the naming conventions defined below, for simplicity and clarity reasons. All that really matters is that a standardised approach is taken.

1. Custom Setting Label – Pluralised in all cases (e.g. Data Sources). No “Setting[s]” suffix.

2. API Name – List Settings
– [Data entity that each list entry represents]ListSetting__c

Each record represents an individual entry and as such singular naming is applied, as per objects.

Examples – List
e.g. Analytic Views – AnalyticViewListSetting__c
e.g. Data Sources – DataSourceListSetting__c

3. API Name – Hierarchy Settings
– [Function of the settings]Settings__c

Each record represents the same set of settings applied at different levels. In concept this differs from objects and list settings, the plural naming reflects this.

Examples – Hierarchy
e.g. Org Behaviour Settings – OrgBehaviourSettings__c
e.g. My App Settings – MyApplicationSettings__c

As a further best practice, it is important that any requisite logic applied to the population of the Name field is described in the Custom Setting Description field. If the population is immaterial then simply state as such. Given the inability to relate settings at the attribute level, it can be the case that the Name field plays some role in grouping related settings.

Finally, where possible always retrieve Custom Settings data via the Custom Settings Methods – not via SOQL query. In the latter case the Application Cache is bypassed and the query counts against the limits context within the Apex transaction.

Spanning Relationships Limit

Experienced Salesforce technical architects will always look to declarative solution options before considering technical alternatives. This type of thinking is best practice and indicative of an architect considerate of TCO (total cost of ownership), future maintenance etc.. In my case I’ll go as far as to challenge requirements such that I can deliver a fit for purpose solution using declarative features. In the main this is exactly the right approach – but in some cases there’s more to consider. Let me explain.

Large Data Volumes (LDV) is currently a hot topic within the Salesforce technical community, there’s some great resources and prescriptive guidance available. But aside from the data aspect, what about orgs with large and complex declarative and technical customisations? In some cases such LCOs (Large Customisation Org) have grown organically, in others the org is being used as a platform to deliver non-CRM functionality such as complex portal solutions, ERP etc.. In either case what we’re talking about here is an org with high levels of custom objects, workflow rules, formula fields, Apex script and so on. In such a scenario it is highly likely that the LCO will be constrained by capacity limits (maximum number or users, custom objects, data size etc.) or execution limits (governor limits applied to Apex scripts etc.). In the organic growth case, where an org may have started life in one business division and then expanded across the enterprise, there will certainly come a point where a multi-org strategy becomes the only option, continual refactoring and streamlining to provide additional headroom will eventually cease to be viable. In light of this multiple-org architectures are becoming more commonplace with enterprises partitioning over organisation structure or business process boundaries, enabling localised innovation and growth with some data sharing and consolidation. That said, the transition from the single-org to the multiple-org model is potentially costly and disruptive, as such the design factors to consider, to optimise the longevity of a single-org implementation in the face of organic growth, are key for an architect to understand and implement from the outset. A firm understanding of the applicable limits for the org type and user licensing model is the best starting point for this, combined with the practical experience of where limits are soft, and can be increased by salesforce.com support, and where limits are hard platform constraints. This latter type of limit being most relevant to the goal of optimising org longevity. An example being the Spanning Relationships limit.

Spanning Relationships Limit
This limit constrains the total number of unique object relationships which can be referenced in declarative build elements (workflow rules, validation rules etc.) associated with a single object. This is a significant constraint on larger data models, and typically surfaces first for the central standard objects (Account, Contact, Case etc.). The soft limit here is 10, the hard limit being 15, however there are also performance degradation considerations at anything over the 10 level. When this capacity limit is reached, the only options are to refactor the declarative implementation or revert to Apex script solution options. It is therefore critical to understand that this hard limit exists when designing a data model and also when adding declarative elements which introduce a new relationship traversal. There may be an argument for some level of denormalisation in the physical data model, it’s generally unlikely that a Salesforce data model would be in 3NF anyway, unlike a traditional RDBMS a data model optimised for storage is not always the right approach.

Returning back to my original point, in considering declarative solution options versus technical alternatives, the complexity of the data model, plus capacity limits applied to the declarative build model are also factors. There’s no silver bullet answer.

Salesforce Logical Data Models

A robust and intelligent data model provides the foundation upon which a custom Salesforce implementation can be built. Mistakes made in the functional or technical build are typically inexpensive to rectify (if caught quick enough), however a flawed data model can be incredibly time and cost expensive to mitigate. At the start of all projects I produce a logical data model, example provided below. this starts out as blocks and lines and improves iteratively to include physical concerns such as org-wide defaults, relationship types etc.. Only after a few revisions will I consider actually creating the model as custom objects. I use OmniGraffle for such diagrams.

Summer ’12 Lookup Relationships

The Summer ’12 release introduces some fundamental changes to the functionality of lookup relationships. In summary:

1. Optionality. Lookup relationships can now be set as mandatory (Required Attribute). This is great news in that the usual validation rule enforcement can now be forgotten.

2. Referential integrity. Prior to Summer ’12, the parent record could be deleted without regard to related child records, which would have their lookup fields nulled. This remains the default behaviour. You can however specify one of the following behaviours in the optional case, for mandatory lookups only 2 and 3 are possible. I’m using the terms parent and child here in the loosest sense for convenience, the nature of the relationship is associative.

2.1 Clear the child field value – default
2.2 Prevent the parent being deleted (Don’t allow deletion of the lookup record that’s part of a lookup relationship)
2.3 Cascade delete (Delete this record also) – This one requires activation via salesforce support, and ignores the sharing model, meaning if a user has record access to the parent, they can delete it and related children without requiring permissions at the child level. This option is restricted to cases where the child is a custom object.

The above enhancements to the lookup relationship type start to blur the lines between lookup and master-detail, however the key differentiation remains in terms of ownership versus association.