How a single engineer brought down Twitter
Twitter’s website is breaking in novel new ways — and while the company managed to recover from its latest outage within a couple of hours, the story behind how it broke suggests there are likely to be similar problems in the near future.
On Monday morning, Twitter users logged on to find a thicket of connected issues. Clicking on links would no longer open them; instead, users would see a mysterious error message reporting that “your current API plan does not include access to this endpoint.” Images stopped loading as well. Other users reported that they could not access TweetDeck, the Twitter-owned client for professional users.
Chaos took over the timeline, as users tweeted vociferously about the outage — often illustrating their points with images that no one could see because they wouldn’t load.
“If you make a change right now, everything breaks”
In a tweet, the company offered the vaguest of explanations for what was happening.
“Some parts of Twitter may not be working as expected right now,” the company’s support account tweeted. “We made an internal change that had some unintended consequences.”
The change in question was part of a project to shut down free access to the Twitter API, Platformer can now confirm. On February 1st, the company announced it will no longer support free access to its API, which effectively ended the existence of third-party clients and dramatically limited the ability of outside researchers to study the network. The company has been building a new paid API for developers to work with.
But in a sign of just how deep Elon Musk’s cuts to the company have been, only one site reliability engineer has been staffed on the project, we’re told. On Monday, the engineer made a “bad configuration change” that “basically broke the Twitter API,” according to a current employee.
The change had cascading consequences inside the company, bringing down much of Twitter’s internal tools along with the public-facing APIs. On Slack, engineers responded with variations of “crap” and “Twitter is down – the entire thing” as they scrambled to fix the problem.
Musk was furious, we’re told.
“A small API change had massive ramifications,” Musk tweeted later in the day, after Twitter investor Marc Andreessen posted a screenshot showing that the company’s API failures were trending on the site. “The code stack is extremely brittle for no good reason. Will ultimately need a complete rewrite.”
Nonstop layoffs have left the company with under 550 full-time engineers
Some current employees are sympathetic to that view, which places at least part of the blame for Twitter’s problems on technical failures that predate Musk’s ownership of the company. The fail whale became an icon of the old Twitter for a reason.
“There’s so much tech debt from Twitter 1.0 that if you make a change right now, everything breaks,” one current employee says.
Still, when Musk took over the company, he promised to dramatically improve the speed and stability of the site. His associates screened the existing staff for their technical prowess, ultimately cutting thousands of workers who were deemed not “technical” enough to succeed under Musk’s leadership.
But nonstop layoffs have left the company with under 550 full-time engineers, we’re told. And just as former employees have predicted from the start, the losses have made Twitter increasingly vulnerable to catastrophic outages.
Monday’s errant configuration change was at least the sixth high-profile service outage at Twitter this year:
“This type of outage has become so frequent that I think we’re all numb to it,” a current employee says.
And those are only the service outages. Other issues, such as the one that led Musk’s tweets to be made more visible on the timeline than any other user’s, have also roiled the user base.
In many ways, Monday’s outage represented the culmination of Musk’s leadership at the company so far. In a single-minded effort to cut costs on his $44 billion purchase, he has been slashing the staff and reducing Twitter’s free offerings.
This paved the way for a single engineer to be staffed on a major project — one that is linked to several critical interconnected systems that both users and employees depend on.
And with few knowledgeable workers on hand to restore service, it took Twitter all morning to fix the problem. “This is what happens when you fire 90 percent of the company,” another current employee says.
Inside Twitter’s HQ, however, the mood was almost light. “We’re laughing all the way down,” says a different current employee.
Read the full article Here