Talk:Data breach

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Wiki Education Foundation-supported course assignment[edit]

This article was the subject of a Wiki Education Foundation-supported course assignment, between 6 September 2020 and 6 December 2020. Further details are available on the course page. Student editor(s): Rc4230.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 19:49, 17 January 2022 (UTC)[reply]

Wiki Education Foundation-supported course assignment[edit]

This article was the subject of a Wiki Education Foundation-supported course assignment, between 28 May 2019 and 2 July 2019. Further details are available on the course page. Student editor(s): Moli chen.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 19:03, 16 January 2022 (UTC)[reply]

Data "Spill"[edit]

I work in the security industry and I have heard of data breaches but never a "data spill". The redirect from "Data Breach"->"Data Spill" is backwards.

The first paragraph that explains the name is "ironic" does not satisfy me or make any sense. So I'm going to switch this. DouglasHeld (talk) 17:21, 25 July 2009 (UTC)[reply]

From the perspective of the data steward, the preferred team is breached, casting attention on the illegal nature of the activity which deflects blame. From the perspective of the black hat, the preferred term might be data extraction or data liberation. From the perspective of the disempowered consumer caught in the middle, data spill was a fine term, having little insight as to the competence or motive on either side. As ever, few inside stakeholders adopt the lingo of the disenfranchised.
I liked the term spill because it was neutral on the malice/incompetence question, and generally there is plenty of both. In the oil patch, it's a spill whether someone drops a wrench or fires a rifle into a tank. I don't think breach is well suited to the case where confidential data is carelessly exposed when remaindered equipment is auctioned off having failed to properly wipe the drives. Big data corporations are supposed to know better, and so is Exxon. — MaxEnt (talk) 23:01, 21 January 2010 (UTC)[reply]

OFS Data Loss database[edit]

There is a list of incidents by number of records, with references, at http://datalossdb.org/, including many large spills not listed here.--Langhorner (talk) 22:41, 16 February 2009 (UTC)[reply]

Broken Link[edit]

Source #1 has a broken link. Many of the examples of data breaches link to this, so alternate sources need to be found. — Preceding unsigned comment added by Treyofdenmark (talkcontribs) 02:31, 21 November 2013 (UTC)[reply]

Federal Employees data breach[edit]

I removed a section about a data breach relating for four million U.S. Federal employees. It was written in a politically editorial style. However, it was certainly relevant, and it would be good to include a section on government data breaches that sticks closer to just the facts. — Preceding unsigned comment added by 207.191.31.200 (talk) 14:04, 29 June 2015 (UTC)[reply]

"Major incidents" section redundant due to List of data breaches?[edit]

I'm wondering if it would be best to remove the "Major incidents" section and instead move its content to a new column of the table of the List of data breaches. What do you think?

--Fixuture (talk) 18:03, 23 September 2016 (UTC)[reply]

The List table is actually intended to only include "major" breaches, those with over 30,000 records, which is noted in the table's intro text. In any case, tables are always tricky to edit so keeping things simple is always helpful. --Light show (talk) 18:31, 23 September 2016 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified 2 external links on Data breach. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 07:51, 7 December 2016 (UTC)[reply]

External links modified[edit]

Hello fellow Wikipedians,

I have just modified 2 external links on Data breach. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 06:10, 5 September 2017 (UTC)[reply]

Recent Pentagon Breach[edit]

Unless there isn’t enough information on this topic (which I highly doubt given how long ago it was), I believe we should add that here.

We can always build on it later… Stevemc01 (talk) 22:45, 30 April 2023 (UTC)[reply]

Wiki Education assignment: Research Process and Methodology - FA23 - Sect 202 - Thu[edit]

This article was the subject of a Wiki Education Foundation-supported course assignment, between 6 September 2023 and 14 December 2023. Further details are available on the course page. Student editor(s): Wobuaichifan (article contribs).

— Assignment last updated by Wobuaichifan (talk) 23:08, 26 October 2023 (UTC)[reply]

Major incidents[edit]

Since Wikipedia is not a collection of news stories, items in the "Major incidents" section should only include those with their own Wikipedia article (thus, "major") and not just any of hundreds of breaches that happen to have a news story written about them somewhere. I'm going to start cleaning this up in the next few days. --ZimZalaBim talk 15:45, 5 December 2023 (UTC)[reply]

Adding a notable data breach[edit]

Should the recent Traderie data breach be added? Here are some sources I found: https://techcrunch.com/2023/09/07/traderie-a-marketplace-for-in-game-items-alerts-users-to-data-breach/ https://www.scmagazine.com/brief/data-breach-disclosed-by-traderie https://www.techradar.com/pro/security/gaming-marketplace-traderie-alerts-users-to-data-breach

Others were Reddit post, not really notable sources, but still gave a hint many were heavily alarmed: https://www.reddit.com/r/RoyaleHigh_Roblox/comments/16g76ws/there_has_been_a_huge_traderie_data_breach_and/ https://www.reddit.com/r/Nookazon/comments/16b6ekw/is_this_just_me/ https://www.reddit.com/r/Diablo_2_Resurrected/comments/16avuc2/traderie_has_had_a_data_leak/ https://www.reddit.com/r/Diablo_2_Resurrected/comments/166ktzs/traderie_compromised/ Cometkeiko (talk) 17:47, 6 December 2023 (UTC)[reply]

Edit request[edit]

Please replace the content of this page with User:Buidhe paid/sandbox3.

Reasons: rewrite article according recent reliable sources, add sections about history and prevalence, perpetrators, causes, prevention and response, avoid duplication with List of data breaches. Buidhe paid (talk) 17:00, 25 March 2024 (UTC)[reply]

@Buidhe paid: I've started in on this task but it's going to take me much longer than expected to responsibly move the old information out to other articles before I can perform your wholesale re-write. I also note that there is some preexisting useful cited content. I'm hesitant to simply replace it. I see that the sources you've provided are books but you missed a few key sentences which can probably stay. It may be a couple days depending on how much time I can devote to this. Chris Troutman (talk) 21:13, 25 March 2024 (UTC)[reply]
@Buidhe paid:  Done Chris Troutman (talk) 18:10, 26 March 2024 (UTC)[reply]

GA Review[edit]

This review is transcluded from Talk:Data breach/GA1. The edit link for this section can be used to add comments to the review.

Nominator: Buidhe paid (talk · contribs) 20:47, 26 March 2024 (UTC)[reply]

Reviewer: Chipmunkdavis (talk · contribs) 14:07, 31 March 2024 (UTC)[reply]


Starting to look at this one. Initial impression is that it's surprisingly short, given what feels a large topic. Will be looking at broadness rather than comprehensiveness, so just an initial note. Another first impression is that it seems written from a US-centric perspective (and the 2005 date here contradicts the linked Security breach notification laws, have not yet checked sources to dig into this). The sources look modern and reliable, the images are all PD and created by the nominator. This was a recent complete overhaul so stability is not immediately obvious, although it seems a clear improvement on the previous version and the overhaul was performed by another editor. I likely have limited to no access to the majority of these sources, but will have a more detailed look at this later. CMD (talk) 14:07, 31 March 2024 (UTC)[reply]

I can send pdfs of the sources if you want. (t · c) buidhe 16:42, 31 March 2024 (UTC)[reply]
Buidhe Thank you for the offer, perhaps if you could share Fowler 2016 and Solove & Hartzog 2022 that would be very helpful. CMD (talk) 09:23, 1 April 2024 (UTC)[reply]
CMD, Buidhe, any further progress on this review? Ideally it should be wrapped up pretty soon. —Ganesha811 (talk) 15:42, 14 April 2024 (UTC)[reply]
I sent CMD the materials but he seems to be busy right now. It's not a huge rush (t · c) buidhe 00:47, 15 April 2024 (UTC)[reply]

Apologies for the delay, took awhile to figure out the article and its sources. Thank you for the materials, used for some of the spot-checks and other notes below:

  • "Since the advent of data breach notification laws in 2005, reported data breaches have grown dramatically." is a very odd second sentence. It feels like referring to a national set of laws? Further, is the reporting linked to the laws, or does it simply reflect the internet becoming more important and widely used? Is 2005 worth emphasising at this point in the lead?
    • Rewrote
  • "Data breaches are most commonly caused either by a targeted cyberattack, an opportunistic attack, or inadvertent information leakage" could use some more explicit expansion in the body, it is mostly implied.
    • Rewrote
  • "Data breaches are most commonly caused either by a targeted cyberattack, an opportunistic attack, or inadvertent information leakage." This seems an odd collection of items. None are used elsewhere in the article, and I'm not intuiting a significant difference between a targeted and opportunistic attack.
    • The difference between opportunistic and targeted attacks is covered in the perpetrator section—essentially when any target will do versus when the attacker wants to attack a particular system. However, I removed from the lead as potentially unnecessary.
  • "...including accidental disclosure of information" does not seem different from the earlier "inadvertent information leakage"
    • Rewrote
  • Regarding long-term risk as covered in "people whose data was compromised are at elevated risk of identity theft for years afterwards and a significant number will become victims of this crime", this will depend on the data being leaked, the risk is mostly elevated only with PII data. (A leak of hashed/salted passwords is a data breach, but not helpful for identify theft.) It is perhaps worth specifically mentioning it is Personal/PII data.
    • A leak of hashed passwords would not qualify as a data breach by most definitions, but I've edited to clarify.

Definition

  • The note that definitions vary should perhaps be placed before the giving of a specific definition.
    • I wrote it like this because the definition initially given is essentially the one used by the sources and covering the article's scope, although exact details may vary.
  • How does the note on company disclosure fit within this section?
    • While some definitions include other information, the laws (entirely) and sources (almost entirely) discuss breaches of personal information. I put this in the definition section to hint at the article's actual content and scope without going into OR.

History and prevalence

  • Data breach notification laws notes laws were first put into place in 2002, although perhaps a different nuance/focus of the laws. Looking into the source, 2005 does appear to match the "widespread" part of your text. The text should be adjusted however to note the US-focus, being widespread only in that country.
    • After more research, I think the source is wrong and I'm fixing it accordingly.
  • More information on other countries seems like it is needed here, GDPR is mentioned in a later section but surely it would fit into history?
    • Moved
  • Speaking also to that source, it places emphasis on California as the leading state. Given the prominence California played and still plays in tech, it is probably due a mention in the second paragraph where the "legislatures around the United States" history is given.
    • Done
  • "In 2016, researcher Sasha Romanosky estimated that data breaches outnumbered other security breaches by a factor of four." Noting that in the source this excludes phishing, which is included as a data breach cause in a later section.
    • Clarified the definition he is using.
  • "In the 2000s, the dark web—parts of the internet where it is difficult to trace users and illicit activity is widespread—began to be set up, increasing in the 2010s with the advent of untraceable cryptocurrencies such as Bitcoin. Information obtained in data breaches is often offered for sale there." While this is all true, it seems incomplete to jump to this without noting that this data can also be available on the regular internet and its many forums. The source does mention illicit markeplaces in general before focusing on the SilkRoad/darkweb. (The Silk Road (marketplace) is actually probably due an explicit mention, although this is slightly beyond GACR consideration.)
    • I checked multiple sources and none of them said much of anything about non-dark web forums for selling data. However, I revised the text to avoid any implication that the dark web is the only forum for sale.
  • Similarly, there are other platforms for information sharing beyond the dark web these days, Telegram (software) is a common one and it feels the article could mention something about this.
    • Even specifically searching for Telegram's use regarding data breaches I cannot find much of anything, so I think it is likely UNDUE.
  • The mention of ransomware feels off-topic.
    • Maybe I could be clearer on this, but ransomware can qualify as a data breach according to many definitions, because it results in "the unauthorized... loss of personal information". It is definitely covered by multiple sources as a type of data breach.
  • Regarding this section I almost think it would be better to narrow it to just "prevalence" or split it to different parts of the article, so we wouldn't have the legal information or technologies used to perpetrate breaches in two different places. Buidhe paid (talk) 17:38, 22 April 2024 (UTC)[reply]

Perpretrators

  • The second paragraph of Perpretrators seems to be more about Consequences? I am also unsure if the methods of communication are relevant to this article.
    • Moved to consequences
  • "The threat of data breach or revealing information obtained in a data breach can be used for extortion, often using ransomware technology (where the criminal demands a payment in exchange for not activating malicious software)." I am not sure about this link. As with the above mention of ransomeware, ransomware is usually a process where a hacker will encrypt data, it is not a function of data leaks but of other security challenges. As an aside, for this source, the page is given as 14, but it seems to be citing 13.
    • Removed.

Causes

  • It is probably worth noting given the first point in Causes is about encryption, that phishing and other methods get around encryption. It should probably be clear that the non-encryption data breaches are the low-level opportunistic breaches, and that encyrption is just one interlinked tool with the other causes. Also worth noting that the wording seems targeted only at symmetric cryptography, asymmetric would have two keys, one for encryption and one for decryption.
    • Added mention of hashing
  • Relatedly, the chart in this section are different to the text. Neither is incorrect, the text instead needs to be more holistic. The note at the end about security and zero risk should be expanded upon near the start, highlighting that breaches are a function of there being data at all and that security is about a layered approach of risk management, rather than implying there are clear distinct causes for breaches.
    • Revised the text to not use bullet points, I hope this is what you meant by more holistic. I also added a mention of defense in depth.
  • Regarding the second point on phising, phishing is a subset of Social engineering (security), and it would be better to discuss that broader concept rather than focusing on just the one method.
    • Done

Breach lifecycle

  • The note on hiring a CISO is another odd text-book style piece of writing that feels like a diversion from the topic. It basically says that to have security, one should invest in security, which feels like a self-explanatory point. I think that this section on prevention is trying to get at the concept that firms often underpriotize security relative to what they perhaps could (should?), but that does not come through in the language which reads more as a presecriptive guide to companies.
    • Rewrote
  • Perhaps one way this section could be less-textbook is by noting that there are industry standards/frameworks for data breach response, and work from that framing. Presumably this (eg. NIST compliance) is the expertise provided by the outsourcing. NIST and a couple of other examples are mentioned in Fowler 2016 pg 51 and 210.
    • Done
  • The sentence on the balance of funding vs outcomes is also I feel trying to get at the broader concept of risk management, and how simple steps cause substantial initial rises in security but there are diminishing returns of unit security per unit investment.
    • It seems intuitive that there would be diminishing returns, but I can't find sources that put it that way.
  • The other advice for prevention on paranoia and proactive action also reads as a how-to. I would also suggest that this is well-known within the cybersecurity field, and does not need specific in-text attribution.
    • I actually ended up removing most of it per your comments above.
  • The standalone sentence on data non-collection/deletion is perhaps worth expanding on, but that is perhaps beyond strict GA needs.
    • Expanded a bit
  • "A penetration test can then verify that the fix is working as expected" in the Response section feels explictly going back to the Prevention section, where pen testing is already mentioned.
    • Penetration test verifies that the system is secure, thus it has a role in both prevention and recovery from attacks.
  • "After the breach is fully contained, the company can then work on restoring all systems to operational" seems another sentence that fits in a how-to guide but doesn't seem necessary for an encyclopaedia.
    • Removed

Consequences

  • I'm having trouble interpreting the graph in Consequences. The dots seem to be aligned as they would in a bar chart by the 2023 number. The 2022 number is in the graph text but doesn't seem represented. I'm assuming circle size is a second indication of scale? And I'm not sure what the lines are trying to tell me. Not strictly a GA issue, but perhaps it could be looked into. A better caption would help a lot here, especially as it is providing definite figures in a section saying the figures are tricky to calculate.
    • I only added the graph because Chris Troutman when implementing my edit decided to keep another graph that was even worse. I couldn't find any really good graphs that seemed informative, so I would be happy to remove it.
  • "Due to increased remediation efforts in the United States after 2014, this risk decreased significantly from one in three to one in seven." This felt unlikely as a statistic, far too high. Checking the source, this appears to be specifically credit card data breaches, rather than data breaches as a whole.
    • Removed this stat as perhaps overly specific.

Laws

  • "Measures to protect data from a breach are typically absent from the law or vague, in contrast to the more concrete requirements found in cybersecurity law." I'm not seeing the distinction here, wouldn't data breach laws fall under cybersecurity? It is a contrast to non-data-related items in those laws, or other laws? (I am unable to figure out which part of the source this stems from.)
    • I'm not entirely sure what they mean by this, so I removed the clause.
  • "Beginning with California in 2005, all 50 states have passed their own general data breach notification laws", perhaps also worth mentioning Alabama was the last in 2018 (same source). Further, perhaps this timeline should be in history and a simple summary noting the current current position should be here?
    • Partly done, although I wonder if the history section should be split as I proposed above.

On a general note, there are various examples of breaches scattered throughout the text. It is a surprise that Pegasus (spyware) is not one of them, given its prominence. Overall, the writing of the current text gives off an odd feeling, like a textbook rather than an encyclopaedia. Some specific mentions of this made above, but it's a noticeable tone throughout. At the same time, many areas feel truncated, touching only obliquely on that their topic of discussion is. There is some room for expansion in the body and the lead. Much of the information is US-focused. Ideally it should be globalized for comprehensiveness, however at the very least the facts and figures constrained to the US should clearly reflect this in their presentation in the article. It would be good to get a picture on how closely these items reflect the whole sources corpus used. No issues specifically found for NPOV, stability, or image licensing. Best, CMD (talk) 13:33, 21 April 2024 (UTC)[reply]

  • Added mention of Pegasus
  • Unfortunately, a lot of the overview sources I could find are distinctly US focused, so it's hard to ensure global coverage. Buidhe paid (talk) 06:41, 23 April 2024 (UTC)[reply]
Break[edit]
Thanks for addressing or explaining all my points above, I'm impressed how much the article has changed. Summaries and some follow-ups below:
  • Lead is reading well now.
  • Surprised you can't find anything on Telegram, it's getting its own dedicated papers. Perhaps it is still rather new and has not percolated to more general sourcing, so if you feel it is undue compared to everything else in current sourcing that is fine by me. Something to watch out for though.
  • I've looked a bit more into this assertion about Ransomeware, and would be interested in some explanation. Forgive the non-scholarly sources, but I'm used to distinctions such as this one between the two. However, reading up such as in here, perhaps I am behind the times and the lines between them are blurred as Ransomeware attacks now sometimes (often? usually?) include data extraction. I do think the article needs clarity on this point to show the relevance, "a type of malware that encrypts data storage" does not fit in with the definition given on the page.
  • Reading the reformulated history and prevalence section, it still feels like a good idea to include a general note from California to Alabama that might bridge the gap before GDPR. Regarding your idea of splitting, I can see how it might work in that case too. If you do split, a simple note of data breaches increasing over time would suffice to frame the legal development if you want to focus on prevalence separate to that.
  • On the cause section, there is still a bit of disconnect between the image and the text. The image causes seem to focus heavily on different types of data storage, which is not covered in the text. Looking at the image on its own, it's not immediately clear why say "Portable device" is different to "Physical loss". My initial guess is that it might be a statement on the security of those devices, but Hacking/malware is a separate category too.
  • "The software vendor is not legally liable for the cost of breaches, thus creating an incentive to make cheaper but less secure software", would help to specify if this is a particular jurisdiction. If it is the US, then perhaps wording such as "Even in the US..." would help while also linking it to the legal history already covered and thus providing implications that there is less elsewhere.
  • I really like this split into technical and human causes, including the linking of the two at the end. The rewrites in Breach lifecycle were also very effective.
  • I would remove the graph in Consequences for now, although if remade into a simpler bar graph it may be quite effective.
CMD (talk) 12:28, 29 April 2024 (UTC)[reply]