Вы находитесь на странице: 1из 9

Agile 2008 Conference

Bootstrapping Scrum and XP under crisis


A story from the trenches

Henrik Kniberg, Reza Farhang


Crisp AB, Sweden
henrik.kniberg@crisp.se, reza.farhang@crisp.se

Abstract When Henrik arrived in Dec 10 2005 pokerdev was


essentially a team of ghosts – pale, stressed, sleep-
During 2006 Tain, a Swedish gaming company, deprived developers and testers. Several people had
underwent a fast and dramatic agile transitioning already been burned out and left the company. The
process driven by a crisis situation. Many of the rest worked insane amounts of overtime. One person
lessons learned are described in the book “Scrum and even submitted a formal vacation application for the
XP from the Trenches”[1]. This experience report upcoming weekend – this was symptomatic.
focuses on the actual bootstrapping process – the
critical decisions and changes made during the first On December 15 Tain hosted Poker SM, the Swedish
few months that ultimately transformed a burning and national online poker championship, a highly
sinking ship into a fairly well-oiled agile software publicized event involving thousands of players and a
development organization. prize pot of €320,000. In the middle of the tournament
the poker system crashed, causing the entire event to
1. Introduction be canceled. The media had a frenzy.

Tain is a gaming software company based in The media was notified that Poker SM would be re-
Stockholm. In December 2005 Henrik was contracted run on Feb 1.
as chief of development to improve the development
process. At that time Tain’s largest and most business A few days after the crash top-level management
critical project was in a state of crisis. The purpose of decided that they no longer could afford so many
the project was to develop and launch an online poker contractors. Henrik was notified that over half of the
gaming system, which at the time involved 42 pokerdev team would have to be terminated on short
developers and testers (more than half of the entire notice. At the same time several new developers would
development organization). Reza was leading the 6- be hired.
man client team in the poker project.
3. Our mission
2. The crisis
Our mission was fairly clear. In priority order:
The online poker system had been launched to
production prematurely a few months earlier. It · Succeed with Poker SM on Feb 1 (six weeks away)
crashed many times every day, causing immense · Make the poker system stable
negative publicity and loss of money to Tain, and · Make the poker system scalable
consequently immense stress on the pokerdev team. · Create an effective and humane work environment

The strategy so far had been to throw bodies at the We knew that if we didn’t succeed with the first goal
problem – get more people in, work longer hours. we wouldn’t be given another chance. If we failed the
PDD – pain driven development. This approach pokerdev team would most likely be fired and the
proved to be expensive, unhealthy, and ineffective. company’s survival would be at stake. We also knew
that, even if we did succeed with Poker SM, we
needed to succeed with the other three goals as well
within a few months or face another crisis.

978-0-7695-3321-6/08 $25.00 © 2008 IEEE 436


DOI 10.1109/Agile.2008.34
4. Our situation 4.2 Organization

4.1 Interview results The pokerdev organization looked something like this:
PokerDev organization
It was impossible to lead the team without Server Server
understanding the current situation better, so Henrik gamelogic platform
started by interviewing most people involved.
Marketing
Developers, testers, stakeholders, the product owner,
etc.
Support Backoffice Client
The interviews took a couple of weeks. During this
period Henrik started understanding who the key
people were and spent time discussing possible Operations
strategies with them. This was how he discovered that DB Test
Reza was a strong informal leader, so from that point
on Reza and Henrik essentially worked as a pair – CFO
Reza from “inside” as developer and team leader,
Henrik from “outside” as manager and coach. This
Customers
turned out to be a fruitful combination.

The interviews revealed a surprisingly consistent and Teams were divided by component or role. There was
dismal picture. The most frequently repeated an open office area with 4 teams:
comments were:
· Server gamelogic team
· “I’ve been doing overtime consistently now for · Server platform team
many weeks, I am about to crash and burn.” · Backoffice team
· “I have no idea what is going on – who is doing · Client team
what, or what I am supposed to be doing”
· “This is the worst code I’ve seen in my entire Two additional teams had their own rooms:
life”
· “I can never get anything done because I keep · DB team
getting interrupted by system crashes or changed · Test team
priorities”.
There were several people that weren’t associated with
At the same time it was surprising to hear that most any specific team and had unclear roles – for example
team members were actually highly motivated, and “application manager” and “release manager”.
had some fairly good ideas about how they could make
the system more stable. They cared about the product. Requirements and requests were continuously raining
They liked poker. They wanted to succeed! in from different stakeholders such as marketing,
support, operations, the CFO, and customers. Each
with their own agenda. Stakeholders would often
physically enter the room and make demands directly
to specific developers.

Releases and patches were made on a team-by-team


basis without adequate testing and integration, causing
vicious release-crash-patch cycles. There was a sense
of panic in conjunction with each release. There was
no stable baseline to release from, instead each release
was manually assembled as a patch.

437
The testers had very little communication with the 6. Getting our nose above the waterline
developers, despite the fact that their room was only
20 meters away. 6.1 First step – product backlog
4.3 Summary of our situation There was no point doing anything else until we knew
our focus and priorities. So we started by trying to
Forces working against us: create a product backlog.

· Severe staff reduction. Pokerdev team was being Here we were lucky. The poker product management
reduced from 42 to 16 due to quick roll-out of group was strong, they were domain experts and
contractors. 4 new employees were starting experienced in Scrum. Since there were several people
during January. in the group we managed to agree on having one
· Overworked team, at the brink of burn-out. person be the official product owner for poker software
· Chaotic and ineffective process. development.
· Short time frame – 6 weeks.
· Immense external pressure. We already knew the primary goal - make the system
· Terrible code base stable enough to run poker SM.

Forcing working to our advantage: Through dialog with key developers and the product
owner we quickly came up with a list of key issues that
· 100% support from top-level management. needed to be fixed in order to increase stability. Those
· Motivated team – they wanted to succeed. became the top items in our product backlog.
· Single clear goal shared by the whole team.
· Willingness to change. It became quite clear at that point that the team had
been wasting a lot of time developing stuff that was
5. Our strategy quite unimportant compared to the stability issue.

Our initial strategy was very simple: 6.2 Second step – focus

Implement Scrum fast! In addition to the top items, the product backlog
quickly filled up with hundreds of other features and
The reasoning behind this was: improvements and bug fixes that various stakeholders
had requested (including the developers themselves).
· We didn’t know the best solutions to our
problems, so we would need to experiment and Here we made a key decision and communicated to the
adapt. whole company:
· The pokerdev team appeared to have the
motivation and skills necessary to solve their The pokerdev team will now focus 100% on system
problems. They just weren’t being given a chance stability. Until that is fixed we won’t work on anything
to focus. else. All requests must from now on be brought to the
· We needed to do something fast. The situation product owner, not directly to the team. The product
could hardly get any worse, so any kind of owner will place each request in the product backlog
change would be good. and assign a suitable priority, which means terribly
low priority for everything except stability
· One of the other product teams (Casino
improvements.
development) had been using Scrum successfully
for a few months already. A small team, but they
This generated a lot of friction. But we managed to
were happy with it.
hold the fort thanks to support from top-level
· Henrik had positive experience with agile
management, good work from the product owner, and
methods from previous engagements and
general awareness throughout the company that we
believed Scrum was a good fit to the types of
need to focus to survive.
problems this organization was facing.

438
6.3 Third step – reorganize useful to them. They did however do daily meetings
where they discussed and prioritized today’s work, so
We still had four problems in our way: in a sense they were doing one-day sprints.

· It was not feasible for everybody to work on We rearranged the office so that the support team was
stability issues. So what should the rest be doing? seated physically between the scrum team and the
· How could we focus? The system was still door. That meant that for someone to disturb the
crashing every day, causing immense disruptions scrum team they would have to physically walk
as the developers had to scramble to implement through the support team which invariably led to a
hot-fixes. friendly “Hi, how can we help you? Those guys in
· No reliable release process and no stable baseline there really shouldn’t be disturbed.”
to release from.
· Overworked team. The new organization looked something like this:

Our strategy was to completely reorganize the teams. PokerDev organization


Support team Test
· Move everybody into the open office area, bring (fight fires!) team
the testers out of their cave and seat them with (virtual)
Product
the developers.
owner
· Create two cross-functional teams:
o One “fire prevention” team that focuses on
fixing the underlying stability problems. Scrum team
o One “fire fighting” team (called the support (prevent fires!)
team) that focuses on protecting the first
team.
Stakeholders
· Make the test team virtual. Each tester is
dedicated to one of the two cross-functional
teams. But the test lead can pull together the
entire test team when necessary to do full
regression testing in conjunction with major 6.4 Fourth step: Start first sprint
releases.
· No more overtime. As manager, Henrik The first sprint started on Jan 2 with a clear goal that
explicitly discouraged (but didn’t forbid) people couldn’t be misunderstood: Poker SM.
from working evenings and weekends. Working
smart was more important than working hard. More specifically: release a stable enough version of
· No releases or patches could be done without the poker system that Poker SM will work on Feb 1
involving the test lead and product owner. The without incident.
test lead should at least know the quality of each
release, and the product owner has final say on Reza was Scrum Master of the fire prevention team,
whether the release should be made or not. with Henrik close behind as coach.
· All teams would reserve time to fix the baseline,
improve our version control system, and find a It was not a clean by-the-books sprint. It was messy.
better release process. The support team managed to hold off many
disturbances, but not all. Sometimes both teams had to
The fire prevention team would be a full-fledged get involved in patching and problem solving.
Scrum team.
There were staff changes. Contractors being rolled
The support team, however, would be almost 100% out. New employees being rolled in. The size of the
reactive – continuously fixing problems as they Scrum team started at 10 but was 15 people at peak.
occurred. The support team couldn’t have a product The daily scrums often took 30 minutes or more.
backlog and had a planning horizon of at most one
day, so we concluded that all-out Scrum wouldn’t be

439
We didn’t have a clear end to the sprint. Did the sprint The diagram below shows the evolution of our Scrum
end on Feb 1, in conjunction with Poker SM? Or did it teams.
end a week before, when the actual release was made?
Or sometime in between? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Despite the messy nature of the first sprint, it was a


world of difference compared to before. The whole
pokerdev organization, every single person, was
aligned to a single goal. We had two teams with
different roles but a shared goal. There would be no
partial credit – we would all succeed or fail together. The diagram is a timeline. Each column represents
one week, each row represents one team.
Although the fire prevention team was disturbed from
time to time they did manage to focus enough to make Each box represents one team doing one sprint. The
critical improvements to the system and make several width of the box is the sprint length. So the first team
new releases just in time for the tournament. did a 4-week sprint, then three 2-week sprints, then a
3 week sprint, etc.
6.5 Poker SM – 1 Feb
As shown in the diagram we spawned a new team
The tournament succeeded. The relief was after every one or two sprints until we reached five
indescribable. Failing was just not an option. well-balanced teams in week 14. We experimented
with sprint length and finally settled on a
As with most victories, luck had its part. The system synchronized 3 week sprint cycle, i.e. all teams start
did continue crashing once in a while even after the and end the sprint together.
tournament. But not as often as before.
Some teams were truly cross-functional, containing
We had succeeded with our first goal! The other three people from client, server, test, DB, etc. Some teams
goals remained. focused on one component, for example the first scrum
team evolved to become a server-only team,
· Succeed with Poker SM on Feb 1 continuously improving stability and scalability.
· Make the poker system stable
· Make the poker system scalable Here’s an illustration of how the organization looked
around week 12.
· Create an effective and humane work environment

7. Spawning more Scrum teams PokerDev organization


Test
Scrum teams team
After the first sprint (actually even before it was
finished) we started spawning additional scrum teams. Product
This was driven by two facts: owner

· The system was getting more and more stable, so


we didn’t need as many people in the support
team.
· We noticed (to nobody’s surprise) that large
teams don’t work too well. Stakeholders
· After each sprint with a too-large team we
learned more about how that team could be split
up.
The support team was now gone. The support load
was low enough that each team could leave some slack
each sprint for this.

440
We kept the virtual test team concept, with testers These changes had a significant impact on our ability
semi-dedicated to each scrum team. This was needed to maintain a sustainable pace and release code in a
for the time being since the poker system as a whole reliable way. It also made it easier for teams to work
still needed many hours of manual regression testing in parallel without stumbling over each other and
in conjunction with major releases, so once in a while polluting the trunk.
the whole test team needed to get together to test a
new release. This improvement to our development environment
would have been impossible to complete without hard-
8. Solving key impediments necked focus from the teams and the product owner,
and the support that the scrum framework gave us.
8.1 Scrum knowledge Since everybody agreed that these improvements were
crucial to our long-term velocity we were able to
Lack of Scrum knowledge in the teams was an allocate time to this on a continuous basis.
impediment. This was quite easily solved. In early
February a group of 5 individuals from different teams Another crucial aid was that Tain had a highly skilled
attended a Certified Scrum Master course with Ken configuration manager, and we managed to have him
Schwaber. fully allocated to the pokerdev team for a few sprints
to help us complete these improvements.
The course alone obviously didn’t make them experts
at Scrum, but having a trained Scrum Master in each 8.3 Release process
team was enough of a “seed” to make the teams start
inspecting and adapting their own process. This freed As we implemented Scrum we realized that our
Henrik from much of the day to day coaching and release process was severely dysfunctional. The
management and allowed him to attack specific operations team was in Malta and there was no single
impediments instead. person there responsible for the poker product. Instead
they had teams of people working in shifts handling
8.2 Versioning and continuous integration multiple products on a 24 hour basis, and relied on
detailed documentation from the development team for
One of our key impediments was that we didn’t have a each release.
stable baseline from which to make releases, and that
our code trunk was in a horrid state. Releasing was a Despite many attempts to improve the hand-over
manual process and therefore time consuming and process there were frequently mistakes made during
error prone. the release process, causing costly roll-backs and
downtime for the poker system.
We spent immense amounts of time cleaning up our
branches, migrating from CVS to Perforce, and This turned out to be quite easy to solve. All we did
changing our whole way of working. We implemented was shift focus. “Individuals and interactions over
a version control model known as the mainline model processes and tools”.
and gave each team their own branch.
We threw away the detailed release process documents
The article “Agile version control with multiple and templates describing exactly how the hand-over
teams” [2] describes in detail how we did this. should be done. Instead we demanded that Malta
provide one person that is responsible for poker
In conjunction with this we also implemented releases, and named him “Deployment Coordinator,
continuous integration using QuickBuild. This Malta” (DC). We then allocated a corresponding
reduced, among other things, client build time from 5 person on our side and called him “Deployment
hours of manual work to 20 minutes of automated Coordinator, Stockholm”. Our test lead took on the
work. DC Stockholm role, which was a good match.

441
DC Stockholm and DC Malta were jointly responsible Some of the refactoring work would be quite complex,
for executing releases. They could use whatever sometimes requiring significant changes to the
process they like – but there was no hand-over of architecture. In order to coordinate these efforts we
responsibility. They worked together until the release introduced a “tech backlog” and a “poker architect”
was complete. role.

We also clarified that release is top priority at all The tech backlog listed the key improvements that
times. The two DCs could request whatever help they needed to be made, in priority order. The poker
needed to ensure that a release was successful. architect was essentially the “internal” product owner
who owned and prioritized the tech backlog (in
Over time the two organizations – development in addition to doing development in his Scrum team of
Stockholm and operations in Malta, grew closer and course). This freed up the main product owner from
starting learning each other’s domains better. This having to understand and prioritize technical
type of clear but informal collaboration, not detailed refactoring tasks.
process documents and hand-overs, was what was
needed to ensure friction-free releases. This agreement served us well. The product owner
knew that the team spent roughly 80% of their
8.4 Test automation & technical debt effective time focusing on the product backlog, and
roughly 20% implementing long term improvements
Another key impediment was the terrible condition of to the code base. We all believed that this was the only
the code. There were no tests at all and it was way to achieve a good velocity in the long term.
impossible to understand all the criss-crossing and
circular dependencies riddling the code. Although these rules weren’t always strictly followed
or enforced, the intent was clear. It took several
The background to this was the fact that the code was months to see the results, but by May it was fairly
initially purchased from another company, already in clear to everybody that both the quality of the code and
a bad shape, and then attacked during several months our velocity had noticeably improved.
by over 60 desperate developers working over-time to
make the system work in time for an unreasonable The tech backlog was a temporary solution, once the
release date in fall 2005. Not only did the release fail most important items had been fixed the tech backlog
to work properly, the code was irreparably damaged was no longer needed, and the team could revert to
and the system crashed every day. more spontaneous refactoring on a continuous basis.

We often considered rewriting the whole system, but It was clear that the team had now taken ownership of
decided each time that it was too risky. the codebase. Instead of complaining about the
codebase they were improving it.
Instead we decided, first of all, to stop making things
worse. We trained the whole team in test-driven During one period we had a dedicated scrum team
development (TDD), instituted a rule that all new code creating a test automation framework. For various
should have tests, and that each team should slow reasons this didn’t work out to well, among other
down enough that they have time to write tests and things it was hard to define clear goals and follow up
gradually refactor and improve the code base. their work.

This was made possible by the Scrum rule that a team 9. Delivering business value
chooses how many backlog items to pull into a sprint.
That enables them to leave room for improving the During Jan and Feb we focused almost exclusively on
code base. stability and scalability, all new features had to wait.

We negotiated with the product owner and agreed that By March the system was stable enough that we were
roughly 20% of our time on average would be spent able to allocate some teams to develop new features.
refactoring the code and improving test coverage.

442
This was a critical milestone from a business 12. Lessons learned
perspective – the ability to actually start delivering
new business value and not just repaying technical The most important take-away points from this
debt. experience are:

By the end of April we finally completed and released · Scrum works. A project might fail despite
a clustered version of the poker system. This gave us Scrum, but it will not fail because of Scrum. A
the scalability that we needed. process can’t guarantee success, but we saw
clearly that even a great team will fail if the
The cluster was also supposed to give us better process is dysfunctional.
stability, since when one node went down players · Scrum is simple but hard. It’s like chess – the
would be migrated to another node and could continue moves and rules are quite simple, but to get good
playing. Ironically, the nodes themselves were now at it you need to practice, learn some standard
stable enough that the failover functionality was openings, and study professional matches.
hardly ever needed. · Technical problems are caused by process
problems. Focus on improving the process, and
10. Where we ended up the technical improvements will follow. In our
case XP followed naturally in the wake of Scrum.
By May the system was stable and scalable. The teams · Good product owner and scrum master is
were working normal office hours, even taking time critical. Without a good product owner you don’t
off regularly to improve their skills. We socialized and get focus. Without a good scrum master you
had fun. don’t get process improvement. Both are critical
to success. Spend the time and money necessary
Compared to December the team had now been to find a good PO and SM.
reduced to half the size and was working about half as · Don’t throw bodies and hours at the problem
many hours, but was producing significantly more o If the process is dysfunctional, adding
business value than before. Very interesting to see! more developers makes development go
slower, not faster.
Needless to say, we had completed our goals and were o Working overtime on a regular basis
happy. makes development go slower, not faster.
· Get an experienced coach. This experience
· Succeed with Poker SM on Feb 1. report and many others confirm the fact that an
· Make the poker system stable. experienced agile coach is an important factor for
· Make the poker system scalable. succeeding with agile transitioning. This is
· Create an effective and humane work environment especially important when the change needs to be
done quickly.
11. What happened after · Beware of framework teams. Teams that don’t
focus on direct business value should be managed
Inspired by the success of the poker project, Reza took extra carefully using clear goals and tight
over responsibility for the whole pokerdev team while feedback loops. Alternative approaches should be
Henrik moved on to implement Scrum in the rest of considered.
the organization. Many of the lessons learned from
that are described in the book “Scrum and XP from
the Trenches” [1].

Since then Henrik and Reza have moved on and


helped several other companies transition to agile
software development, and learned that most lessons
from Tain still hold true.

443
· Experiment! Don’t spend too much effort trying 13. What we would have done differently
to get everything right from start. Here are some
examples of important lessons specific to our When implementing Scrum and XP in the rest of the
context that we probably wouldn’t have learned organization we gained some knowledge and
without experimenting. experience that would have been useful to apply
o Take testers out of their room, place them earlier.
with scrum teams, but keep them as a
virtual test team that can be pulled together If we could go back in time to December 2005 we
for regression testing. would do many things the same way, but some things
o Have Deployment Coordinator roles instead differently:
of a detailed written release process.
o Use a tech backlog to coordinate major · Introduce pair programming earlier.
refactoring work. · Skip the test framework team, or at least equip
o Export the product backlog to physical them with clearer goals.
index cards during sprint planning. · Use physical taskboards for the sprint backlogs
o An effective way to do sprint planning with from the beginning, instead of Excel
multiple teams is to rent a large conference spreadsheets. We discovered later that taskboards
room offsite and do all teams at the same were much more effective.
time.
14. References
[1] H. Kniberg, Scrum and XP from the Trenches, C4Media,
Stockholm, 2007.

[2] H. Kniberg, “Agile Version Control with Multiple


Teams”, InfoQ,
http://www.infoq.com/articles/agile-version-control,
March 2008.

444

Вам также может понравиться