The Biggest Challenges of Moving To NoSQL - DZone Database

9/25/2017 The Biggest Challenges of Moving to NoSQL - DZone Database
The Biggest Challenges of Moving

to NoSQL
by Judah Himango · Sep. 20, 17 · Database Zone
Like (5) Comment (0) Save Tweet
Traditional relational databases weren’t designed for today’s

customers. Learn about the world’s first NoSQL Engagement
Database purpose-built for the new era of customer experience.
We are in the midst of a shift towards NoSQL data stores across the software world,
especially in the web and mobile space. Many developers and enterprises are
migrating or looking to do so, yet a great chasm exists between traditional SQL
databases and NoSQL databases. What are the challenges facing developers moving
from SQL to NoSQL?
Personally, I've experienced both worlds, having developed and maintained SQL and
NoSQL apps and migrated several large apps to NoSQL. In this article, I pass that
wisdom on to you, dear reader: pitfalls to avoid, riches to reap, new ways of thinking.
My experience is primarily with RavenDB, a NoSQL database built on .NET Core, but
the lessons here are general enough to apply to many NoSQL databases.
Let's dig in.
Breaking Out of the Relational-Only Mindset

A primary challenge when moving to NoSQL databases is, surprisingly, human rather
than technical.
NoSQL changes the storage game, yet humans persist in old modes of thinking. Like a
developer writing C++ using Visual Basic paradigms or attempting mutable object-
oriented programming in functional Haskell, so also is the developer who exclusively
uses relational database ideas in a NoSQL database. The outcome is often ugly and
fails to realize the benefits of NoSQL stores.
Consider, for example, the shape of your objects. Developers coming from a relational
database world invariably default into a model friendly to relational databases, even if
https://dzone.com/articles/the-biggest-challenges-of-moving-to-nosql?oid=facebook&utm_content=buffer269e2&utm_medium=social&utm_source=fac… 1/9
that model is not friendly to the app itself. We have been bending to the will of the
database rather than the database bending to our apps.
In NoSQL, models should be based on how the app interacts with the data rather than
how the model can be stored as rows in one or more tables.
In one app that I migrated to NoSQL, we had a table ManufacturingCompany and a

table Contacts . Each ManufacturingCompany had one or more Contacts ; a one-to-many
relationship via foreign keys. In our code, we had a ManufacturingCompany class that had a
list of contacts. Our Object-Relational Mapper (ORM) then populated that list via a
SQL join.
We had to break out of the SQL mindset and realize we didn't need to shape our model
this way. In our case, contacts were never updated outside the context of a company.
We didn't need to store contacts separately from manufacturing companies; we only
did so because SQL wouldn't let us store a list of complex objects inside another
object.
In object-oriented programming, we have a term for this: encapsulation. NoSQL

databases, especially document databases like RavenDB, MongoDB, and CouchDB,
allow for encapsulation. We changed ManufacturingCompany to contain a list of contacts, no
need for foreign keys, relationships to other documents, JOINs, or ORM mapping.
Freedom! Store Arrays, View Models, and Complex Objects

In another part of our app, we had an entity Graphic that had a list of Rankings . A
ranking was really just a number. But since we cannot easily store a list of numbers in
SQL (it would require extra coding every time we loaded, queried, or updated
a Graphic ), we took the path of least resistance and created a table Ranking , with a
foreign key back to the Graphic table; once again, a one-to-many relationship.
But in NoSQL, we were able to simply store the rankings as an array of integers right
on the Graphic object; once again, simplifying our model and improving performance
through eliminated joins.
In another instance, we had a complex stored procedure that pieced together data
from over 20 tables on the fly to display a complex admin UI dashboard. Moving to
NoSQL, we realized this was no longer needed. Firstly, we could store complex objects
as a single entity, greatly reducing the number of joins. Secondly, we utilized
RavenDB's Index support to author an index that pulled in the remaining pieces of
data. This index is then pre-computed and automatically maintained as data is
inserted into the database, giving us blazing fast queries over a much simpler model.
The devs rejoiced.
Leveraging Non-Integer Keys

Keys (that is, unique IDs) in a NoSQL database are often strings rather than integers
or GUIDs.
For example, in RavenDB, it's common to see documents with IDs

like graphics/1 or manufacturingCompanies/32 . These keys are strings, they are unique in the
database, and they are assigned automatically when storing your object in the
database.
This felt strange and foreign to our dev team in the beginning. But over time, we found
advantages in string keys. In some scenarios, well-known names can be used as keys,
resulting in fewer queries and faster lookup times.
In our case, we had users in our software system with unique emails. It's not possible
to have two users with the same email address. Rather than store users as users/1 , we
supply a well-known key of users/foo@email.com , utilizing the unique email address as
part of the unique key.
When it comes time to log in a user, we don't have to execute a costly query looking for
a user with that email address nor dedicate an index. Instead, we can simply load a
user directly by its known key, a highly efficient and fast operation.
We used the same technique for session tokens for users, user roles, and more.
Maintaining Identity Within Non-Roots

Another hurdle when moving to NoSQL was that since not all objects are roots (that
is, some objects exist merely as a child inside a parent object), how does one maintain
identity within these non-roots?
In the previous case, we had moved Contact objects as child objects of a

parent ManufacturingCompany . Since contacts are no longer root objects, they don’t get
assigned a key or ID by the database.
Often, this is acceptable. We found many cases in which we thought we needed IDs on
child objects but we didn’t actually need them.
But in a few scenarios, we did need to uniquely identify child objects. Perhaps we
needed to send an email to a particular contact and do some logging off that contact.
In such cases, we initially tried referring to the contact by index, i.e. the fourth contact
inside ManufacturingCompanies/32 . But this turned out to be ugly and error-prone. We
instead moved to a model where when we stored a ManufacturingCompany , we manually
assigned a GUID to contact.Id — essentially a manual way of assigning identity. We
found this sufficed for the few scenarios where nested objects needed to be identified
and manipulated.
Yes, Relationships Are Permitted in NoSQL

Folks outside the NoSQL world often believe that moving to NoSQL means removing
all relationships between entities. While this may be true of some NoSQL systems
(particularly key-value stores), document databases like RavenDB not only permit
relationships between entities but also make them simple to work with.
In RavenDB, our Graphic object refers to another object, Film . To link these
documents, we put a .FilmId property on the Graphic . At query time, RavenDB provides
an .Include method to join the related Film into the results.
In short, relational data works fine with NoSQL. Instead of forcing you into a
particular model, NoSQL databases give you additional freedom in how you shape
your objects — whether you're dealing with traditional relationships between distinct
entities or child objects embedded inside a parent. Both are supported in NoSQL.
Recognize this freedom for a smoother transition to NoSQL.
Technical Challenges
Many of the above issues are human and psychological; changing modes of thinking
for new paradigms.
But there are technical challenges, as well, and we encountered several while
migrating apps to NoSQL.
Sacrificing ACID
NoSQL databases are not a panacea. They make tradeoffs to achieve their high
performance and high scalability. Usually, this comes in the form of sacrificing one or
more staples of SQL databases: atomic, consistent, isolated, durable (ACID)
guarantees are forfeited to gain performance or scalability.
For example, atomic writes (in which all changes take place, or none of them do) is a
wonderful feature of traditional relational databases. However, it comes at the cost of
performance and internal database complexity. Many NoSQL databases do not offer
atomic writes for multiple documents, and in return give better performance.
And while consistency is another great feature of SQL databases, it impedes the ability
to scale out a database across multiple nodes, so many NoSQL databases give up
consistency.
This moves the onus for data correctness into your app. That sounds scary! But in
practice, this was already the case in relational databases on a lesser level; ACID
properties could never guarantee that the data was sound from a logical perspective
but only guarantee correctness based on the keys, triggers, and constraints setup.
No Transactions Unless You’re Sophisticated

We found RavenDB to strike a nice balance between the safe, traditional world of
relational databases and the Wild West that is often NoSQL. In RavenDB, transactions
are supported, thus guaranteeing atomic writes. RavenDB’s API requires you to begin
a transaction and end with committing that transaction; all writes in between are
atomic.
Other NoSQL solutions, including MongoDB, don’t offer transactions or atomic multi-
document writes. Moving to these NoSQL platforms requires consideration of
additional points of failure: If you need two documents inserted together, what
happens if the first succeeds and the second fails? If your NoSQL database doesn’t
support transactions, you’ll either hope for the best (you’d be surprised how many
developers do this!) or roll your own two-phase commit.
Eventual Consistency
Instead of offering ACID properties, NoSQL databases typically offer BASE properties
(basically available, soft state, eventually consistent). This is a fancy way of saying that
NoSQL databases punt on all the super nice things that we get for free in a relational
database.
Remember why? NoSQL punts on one or more of the ACID guarantees in order to
provide better performance, scaling, and more. Tradeoffs.
RavenDB offers a compromise — writes are ACID. Reads by ID are ACID. But queries
(that is, querying on something other than ID) are BASE.
The end result is that your data remains consistent with atomic, multi-document
writes, and your queries remain fast, running against pre-computed BASE indexes.
The tradeoff? The indexes are eventually consistent; they might return stale data.
Updated an entity 1ms ago? Querying it now might return the old data.
Our team encountered this issue later in production, typically when our data was
larger and our indexes took longer to update. We handled this using RavenDB-specific
patterns. Raven offers a .WaitForIndexesAfterSaveChanges(…) that instructs Raven to block
when committing a transaction until the changes are updated in the necessary
indexes. And for scenarios where up-to-the-millisecond data is needed, RavenDB
offers various .WaitForNonStaleResults methods.
Other NoSQL databases offer different kinds of querying guarantees. MongoDB

indexes are updated synchronously, which means no stale data, but blocking writes.
Each NoSQL database has its own guarantees, and a prudent developer will
investigate and weigh the offerings.
Planning for Orphaned Data

Invariably, in NoSQL databases, you need to plan for data that may break business
rules. In our case, sometimes, our related objects might no longer exist; i.e.
a Graphic object with a .FilmId of films/42 , but no such film exists.
While SQL databases would have foreign keys with consistency to ensure the database
is always in a valid state, most NoSQL databases have no such guarantee.
This was a major concern for our team initially, but over time, we have discovered this
to be only a minor issue. Ultimately, it reveals the existence of a logic bug in
application code. We handle these issues with judicious defensive programming and
occasional logging.
No Stored Procedures
Many NoSQL databases don’t have the concept of stored procedures. Our team again
thought this would be a major issue. But instead, we found it liberating. Moving
business logic into the app itself, rather than in the database, allowed us to have better
control over the requirements and behavior. Typically, what we expressed in a few
hundred lines of SQL could be concisely stated in a dozen lines of modern object-
oriented app code.
And for the times we truly needed code in the database to be shared across queries, we
found RavenDB’s indexes, transformers, and patches to fit the bill.
Lack of Triggers; Overcome With Listeners

A minor issue we encountered when moving to NoSQL was a lack of triggers. One app
we ported from MySQL used triggers to update dates stored on entities as they were
inserted. Another used triggers to fire off external processes that would mutate data in
other tables.
Most NoSQL options provide a means to work around this, however. MongoDB, for
example, allows users to utilize tail-able cursors to monitor for changes.
RavenDB has a built-in feature known as Subscriptions, which allows you to

programmatically subscribe to changes to any document, collection, or query. We
used this feature in our app to do geolocation lookups of addresses users stored in the
system. We .Subscribe to the Company collection, and if a new or updated company
address is inserted, we fire off an asynchronous geolocation API. While initially
skeptical, we found this a cleaner and more powerful model than SQL triggers.
Patching and Model Schemas

NoSQL databases promise the flexibility of schema-less data. And it’s true; there are
no constraints on the type or shape of data you store in the database.
Ultimately, though, your app intrinsically does have a schema. Your users have
an .EmailAddress , and so on. What happens when you need to evolve your objects? Does
your database evolve with it? How will you handle old versions of your objects stored
in the NoSQL database?
In the relational world, we typically handled this using database upgrade scripts that
would modify tables, columns, and data in those tables. Object-relational mappers like
Entity Framework would further abstraction over the same concept. Failure to update
the database schema in the face of new objects results in runtime breakage.
In NoSQL, the story is quite a bit brighter. In our experience, NoSQL databases tend
to “roll with what you have” — if you added a .IsFoo property to your users, no action
is needed to update your data. Querying for old users will return the default value in
your programming language; .IsFoo will be false.
For more advanced scenarios, NoSQL databases offer various means to patch existing
documents in the database. Mongo offers an .updateMany method to patch documents in
a collection. Raven has similar functionality to execute a patch, written in JavaScript,
across a collection of documents, as simple as this.IsFoo = true .
In our larger apps, we’ve found that it’s wise to have DatabaseSchemaVersion document.
Our app then executes patches based on the DatabaseSchemaVersion value during startup.
Moving an Existing SQL App to NoSQL

The strategy that’s worked well for us on multiple apps is to first identify the new
shape of the model.
As discussed above, a SQL model shouldn’t map directly to a NoSQL model. Do what
makes sense for your app and model your objects accordingly. Should the Album object
embed a list of songs, or should the songs be stored as separate documents? Do what
makes sense for your app.
Once a data model has been identified, migrating the data usually involves exporting
to CSV or some other independent format. Most NoSQL databases support importing
JSON or CSV dumps. We import these, and then run patches on them to move the
data into our new, app-friendly NoSQL model.
Alternately, NoSQL databases typically provide some form of writing documents in

bulk. RavenDB offers BulkInsert , allowing high-performance batched import of data.
We’ve found this to also be successful and easy to work with: writing a method in that
reads in a JSON or CSV file dump from SQL, loading that data into our new NoSQL
model, and storing it via bulk insert into the database.
Finally, a less disruptive means of adopting NoSQL can be as a supplement to, rather
than a replacement to, relational databases. We’ve adopted this strategy in one app
where the effort to replace SQL was too great; we adopted a NoSQL database to help
when dealing with complex objects that didn’t fit well as rows in tables.
Conclusion
The biggest challenges in adopting NoSQL are usually human rather than technical.
NoSQL is a new way of thinking about data storage, and too often, devs are spellbound
to the relational mindset of the 1970s. New paradigms and new software realities
demand new thinking about databases and data modeling.
There are technical challenges, as well. Most prominent are those dealing with the lack
of ACID guarantees. Each NoSQL database makes different promises and tradeoffs
regarding ACID; a wise developer will investigate and weigh them.
Despite all these challenges, I almost invariably turn to NoSQL; the riches are vast!
Ease of development, rapid app evolution. No tables to set up, no foreign keys
required, no object-relational mapping to configure. Just store your stuff! Even if your
stuff is complex objects, view models, lists, encapsulation, polymorphism, and more,
NoSQL works with modern apps.
But it’s more than developer joy; it’s performance and scalability. NoSQL databases
deliberately punt on certain ACID properties and instead gain a powerful scale-out
capability for modern cloud workloads, optimized for fast and efficient read-heavy
apps. In the modern era, reads that take even a fraction of a second reduce user
engagement and, as Amazon famously found, reduces revenue. Speed and scalability
and developer joy — these are the rich gains that we have found to be well worth the
challenges

The Biggest Challenges of Moving To NoSQL - DZone Database

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

The Biggest Challenges of Moving To NoSQL - DZone Database

Загружено:

Авторское право:

Доступные форматы

9/25/2017 The Biggest Challenges of Moving to NoSQL - DZone Database

The Biggest Challenges of Moving

Like (5) Comment (0) Save Tweet

Traditional relational databases weren’t designed for today’s

Let's dig in.

Breaking Out of the Relational-Only Mindset

In one app that I migrated to NoSQL, we had a table ManufacturingCompany and a

In object-oriented programming, we have a term for this: encapsulation. NoSQL

Freedom! Store Arrays, View Models, and Complex Objects

Leveraging Non-Integer Keys

For example, in RavenDB, it's common to see documents with IDs

Maintaining Identity Within Non-Roots

In the previous case, we had moved Contact objects as child objects of a

Yes, Relationships Are Permitted in NoSQL

No Transactions Unless You’re Sophisticated

Other NoSQL databases offer different kinds of querying guarantees. MongoDB

Planning for Orphaned Data

Lack of Triggers; Overcome With Listeners

RavenDB has a built-in feature known as Subscriptions, which allows you to

Patching and Model Schemas

Moving an Existing SQL App to NoSQL

Alternately, NoSQL databases typically provide some form of writing documents in

Вам также может понравиться