I’ve been using much of my time lately revising for my first Salesforce Architect exam: Certified Data Architecture and Management Designer. I’m really happy to say I passed it last week, so I’m going to share a few key tips with you – it won’t make you pass the exam – only you can do that – but it may help to get some more recent confirmation of the topics covered. Salesforce changed the resources only about 10 days ago and their Inkling guide has now gone.
How I Studied for the Exam
I started, as always, with Trailhead. I put a Trailhead Mix together for this certification, which you are most welcome to make use of; I will update it periodically as new content comes out. Experience has told me not to ignore the study guide; so I then followed the study guide and made my own notes as I went through each article listed in it. There is a really useful webinar about large data volumes, which I watched and then I read through some real-life examples of how NOT to manage large data volumes – sadly they made familiar reading! Once I was familiar with those, I started working with a brilliant Quizlet set written by Patricio Penaherrera. Thank you for sharing that, Patricio – it was a really useful way to revise while I was on the home stretch.
Another good blog to read is by Maciej Jozwiak – thank you, Maciej.
So, for those of you who’ve read the study guide, these topics won’t be a surprise to you, but where I can recall any of the question themes I will put them here for you. Good luck and please do tweet me if this has helped you!
Large Data Volumes
Read this. Comb it. Know it.
Types of API
- SOAP API
- REST API
- Bulk API
- Streaming API
- Metadata API
- I was asked about techniques for managing data skewing:
- Ownership skew – many records with the same owner – it can slow performance
- Lookup skew – many child records against a single record – think lots of contacts against a single account
Note that skewing data can cause lock contention if you are doing lots of operations e.g. updates. Take an account with many contacts as an example. The more contacts sitting against an account, the more likely you are to encounter lock contention, since editing many contacts means the account is locked, then unlocked – for every contact that’s updated.
- I was also asked whether you would give a user who owned >10k records no role, a role at the top of the hierarchy, or a role at the bottom of the hierarchy, knowing how it can affect performance. Hint: see the Sharing & Visibility Designer resources for a clue!)
SOQL vs SOSL
- Know the difference between the two:
- SOSL: runs a query using a text string, across multiple objects, searches indexes first, 2k records max
- SOQL: runs a database query using SELECT, within specified objects, you choose whether to select indexed fields when writing the query, 50k records max
I found it helpful to know off by heart:
- The field types that are indexed automatically by Salesforce
- The types of fields that you can build custom indexes for
- Remember: External IDs are always indexed (that came up for me); it also helps if you can remember which field types can be external IDs
- How you can make queries run faster by selecting indexed fields as a priority
Deterministic Formula Fields
Nowhere I looked told me that these were basically formula fields that don’t pull data through from other objects or use any dynamic dates e.g. TODAY() and NOW(). So they can be indexed! I had to kind of figure it out by looking at what makes a non-deterministic formula field. A question did come up on this.
- Know what they are and how to enable them
- Know what types of fields can be included in them
- Know what the limitations are
- You can’t add new fields to the table, you’ll need to get Salesforce to drop and re-create the table
- They only copy down to Full sandboxes
- Know how the bulk API works:
- You run a query (query/insert/update/delete/upsert)
- The Bulk API loads into temporary tables
- The Bulk API creates batches, then you submit them
- It processes the batches asynchronously
- Know when to use Serial mode vs Parallel mode and what the default is
- Know how bulk queries work (the following diagram shows my visual representation of what the materials say)
- I was asked about the best way to handle lock contention during a data load; I knew from the studying that the best way is to re-order the record IDs so that it handles each record in sequence – i.e. lock – insert – unlock – on to the next one.
- The question was about loading Order Line Items. I told it to re-order the .csv file by Order ID (there was another option to load by Order Product ID – that’s to catch you out!) – and then load them in
- Lots of the materials describe what happens when the Bulk API encounters locks – I have again visualised this for you. There were a few questions in the exam about it
Primary Key Chunking
- Know what it is (a strategy for optimising queries – order record IDs
“With this method, customers first query the target table to identify a number of chunks of records with sequential IDs. They then submit separate queries to extract the data in each chunk, and finally combine the results.” – Bud Viera
- Know when to use it
- Know what issues it can solve
- I had a question that gave me a SELECT statement, asked me to compare it to another one and pick the one that would run the fastest. Watch out for the one that contains only indexed fields (these run faster!!).
- One of my questions was to give 2 tools I could use to deal with dupes. I chose Data.com Clean and Duplicate Management in Salesforce
- Know which objects you can use Duplicate Management for
- Know what Clean can do to help (I had 2 questions on this)
- Know the difference between Clean and Prospector
- A question came up around managing architecture between a legacy CRM, Salesforce, and ERP. It asks whether you should integrate all 2, whether you should use web services etc. It KILLED me that I couldn’t have any paper and pen to draw this out.
- I chose to have the ETL in the middle, integrating ERP and Salesforce and the legacy CRM with ERP.
Data Quality & Governance
- I was asked which responsibilities (2 out of 4 options were correct) would fall under data governance. So it’s worth understanding the differences between data architecture, data stewardship, and data governance in an enterprise environment:
- Data Governance: The management of availability, usability, integrity and security. Usually includes a committee or council who defines procedures and plans the execution of those procedures
- Data Stewardship: The management of an organisation’s data assets, to give users high quality data (e.g. a Business Intelligence / Reporting department within an organisation)
- Data Architecture: The management of the database itself
- Know that data quality has the following considerations:
- A few questions came up on how you can improve data quality; some answers included dashboards and Data.com
- Generally developing a data management plan that covers the 3 areas above is a good answer
Relationship to the Integration Architecture Exam
I’d recommend doing this exam before you try the integration architecture exam, but there are definitely some overlaps in content between the two. The integration architecture exam includes content around backup optimisation; having knowledge of PK chunking and avoiding timeouts will stand you in good stead for this exam. I’d recommend you do these two exams relatively closely together, since the information will be fresh in your mind.
That’s all I can remember for now, but I will keep updating this article as more comes back. I wish you all the very best of luck in passing your certification exam! Tweet me @gemziebeth if you’d like to ask any specific questions :0)