As someone who has been the target of an FBI investigation for what was effectively criminal copyright infringement (later arrested and did time in prison), my only takeaway is that this, if anything, should just be a civil suit just like so many other similar cases of copyright issues.
In my personal experience, the priorities of the FBI are typically highly politically motivated. The exceptions are if you’re doing something seriously icky, or doing fraud that deceives people.
For those interested in what’s reported and what actually happens, I’ve made some comments on my case and my experience here: https://prison.josh.mn
>There's a certain freedom in owning your story publicly. People can't weaponize what you've already made peace with. I think that's what I'm motivated to do here.
Really nice. It also builds some credibility currency, the reputation economy is not as punitive in your case as I thought it would be.
FBI wants to remind everyone that only US mega-cap companies can scrape the entire internet, not share the data, use it to train AI models and then charge people to use chatbots that use this 'laundered' data. Anybody else attempting to do this is a criminal in their eyes and must be punished.
Archive should just rebrand as an AI start up then offer an 'llm' that is suspiciously 'over-trained' and happens to spit out the site you query exactly... Copyright infringement? Nay! Over-training! "A fix is coming soon™!"
Not just that... they want to control and erase history, so they fully control the narrative. Same as the Roman Empire did with the crusades and burning of books.
The FBI is also used by various organizations to investigate various crimes, whether real or not. For instance, the Bureau of Industry and Security thought my company was flaunting export regulations and had the FBI raid my company to investigate. It turned out to largely be due to a paperwork problem, but the BIS didn't have the power to investigate so they contracted the FBI to provide the manpower to do the raid. So who we saw was the FBI, but it was really the BIS originating the raid.
Just a note: the White House also uses archive.ph.
Search for “Americans are spending like never before: Retail sales are booming — up 5% over last year, far outpacing inflation — as Americans spend in record amounts.” [1]
Why would it be LLM-assisted when maps of what sites link where are part of the core WWW infrastructure? Google made a trillion dollar business out of that.
I occasionally read these articles and wanted to know what sources they use, besides websites like The Daily Caller, to back up their claims. I noticed this some time ago and remembered it. But it took me a while to find the article again. ;)
I pay subscriptions to some of these sites and still use archive.is on them because it is a more pleasant reading experience. No auth failures, no annoying popover windows begging me to subscribe to their dumb newsletter. Just the internet equivalent of a static piece of newsprint.
my personal theory is that archive.is has paid subscription accounts (legit or via botnet) to most of the major news outlets and edits the html to make the sites look not logged in. I wonder if they do it by hand or by doing something like : https://github.com/pirate/html-private-set-intersection
It is definitely more than that for some sites and it has to be manually managed. For example this year i've seen archive.is capture paid articles of some finnish newspapers and the layout gives away that it is logged in on an account although the identifying details have been stripped out.
There have been periods of weeks/months when they don't have paid access to those Finnish sites. Tried it just now on a hs.fi paid article from today and it didn't work, but for example paid articles from just a week ago seem to have been captured as a premium user.
It is curious how they have time to do it and I wonder if news sites of other smaller languages get similar treatment.
Is there any more annoying popup than the newsletter popup? I'd rather see a targeted ad than that BS.
NO! I do not want your newsletter! I wouldn't even have an email address if it wasn't absolutely required to operate in society today. The less email I get, the better!
Email is becoming like fax machines: An old, dated technology that refuses to die.
> Is there any more annoying popup than the newsletter popup?
Rhetorical I'm sure, but actually yes! The popup that tries to get you to switch to the app when you are actively trying to give the offender money! eBay is a notable offender here (it pops up when I search for stuff to buy; why would you interrupt that?)
Personally I don't mind an offer to subscribe to the newsletter but Substack is way to aggressive. They show the prompt even before I have finished the article (How do I know if I want to subscribe?) and obscure the article (actively working against what they know what I am trying to do). So I now just immediately back out when I see that. I won't visit sites that are purposely harming the experience.
Physical feels that way to me sometimes. In the US, I get assaulted on a constant basis by mailers and ads for things I never expressed any interest in. Waste of time, waste of paper, waste of resources.
Sites used to open up popup windows, then browsers got better at blocking them. But instead of taking the hint that people didn’t want that, they just moved the popups inside the page content. The prevalence of ad / tracking blockers is entirely due to the user-hostile actions of site owners.
When there are a few simple nice things making our lives a little bit more bearable, there are always other zealous assholes desperate to ruin that.
Here I speak about this site, but everyday we have new cases of that. Like "new tax on anything that starts to be popular" for France, or Google trying to kill our privacy and F-Droid by requiring all app devs to have attestation from them.
So interesting that with this link, I saw the whole article is a couple paragraphs. With the original link, I gave up after the second ad that almost covered my whole screen on mobile. Too many ads are a terrible user experience that doesn’t let us read anything.
Something new I found - go to any link (news article) on commondreams.org in Firefox. Now use the Reader View button in your address bar - they've figured out how to hack Reader View to not show the content and only a beg screen for money.
For the last month or so archive.is is not working for me, is it maybe related to this? Btw I always assumed owner is from Russia because he or they were so secretive about everything except occasional blog post or occasional Q&A and Russians are usually obsessed by their web pet projects.
For real someone needs to make legit business of archiving the web where you would have timestamped hashes of your archived web pages and "unlimited" storage for archiving ofc only if you pay for the "unlimited" storage.
The subpoena cites the following statute as authorization: "(1)(A) In any investigation of (i)(I) a Federal health care offense; or (II) a Federal offense involving the sexual exploitation or abuse of children, the Attorney General; or (ii) an offense under section 871 or 879, or a threat against a person protected by the United States Secret Service under paragraph Secret Service determines that the threat constituting the offense or the threat against the person protected is imminent"
One of the agents named in the subpoena appears to have previously worked on child exploitation cases years ago:
That seems like something that should be handled with a simple takedown request and those behind archive.is would almost certainly comply. 99.999% of people using archive.is are using it to bypass news article paywalls nothing more. Which, if we're honest, is the real reason why the FBI is going after them.
Personal anecdote but I almost never use these archive sites to bypass paywalls. I only use it when I want to see how establishment news sites somehow sometimes accidentally tell the truth, then, when they get the call, they try to purge their original reporting. Again, it might be my personal bias, but in my opinion, this is the main reason they are going after them. Because these websites let people prove the hypocrisy and the lies.
I remember that when[0] Reuters took down that one story about organized crime, and further DMCA'd the Internet Archive to take down their version, archive.ORG cheerfully did the memory-hole thing—while archive.IS stayed up.
If the (Western) internet were to turn into a monoculture of Western-domiciled big corporations, that kind of censorship would be *effective*. Our systems aren't robust against bad-faith actors attacking the free flow of information. (And the root cause of the planet-spanning censorship cascade in that example was, unambigiously, bad actors. A crime syndicate based in India).
The fact the internet is global and freely connects to legal jurisdictions and cultures very different from the West's, is to the West's benefit: it creates an escape-hatch for things that fall between the cracks of our nascent totalitarian technologies.
I get freaked out when I consider the future of archive.is. Thanks to the nature of the web today, it is incredibly fragile.
As the co-creator of a censorship-resistant publishing platform, I really wish we would migrate to a peer-to-peer technology. We could develop network effects on a decentralized platform with a cryptographically-provable network of trust. Most people don't realize it is possible to handle media distribution in a robust way.
I'm not just trying to shill my solution! I wish there were more competitors using these techniques to try and save the web.
Utilizing p2p tech is not illegal. It is illegal to redistribute copyrighted content without authorization- and we are working to build this into the protocol so that peers will respect copyright by default. People can redistribute at their own risk. I'll be the first to admit that this is complicated, and we have a long way to go in this regard.
Plus, the vast majority of people will just use the web frontend, with a peer on the server. Most peers can be hosted by content creators and tech-savvy friends+family.
Almost every machine in the world participates in at least one peer-to-peer network: Windows Update. There was a time when the Steam client also used bittorrent technology, not sure if they still do.
Obviously P2P gets used in various things, my point was just, that (most) people likely won't willingly join P2P networks to fight "censorship" or help archive things with questionable content or tainted with potential copyright infringements.
We need to preserve data. The FBI is trying to kill data.
We can not allow the FBI to work for Evil here. I actually think there should be a human right to data. With that I mean, primarily, knowledge, not to data about a single human being as such (e. g. "doxxing" or any such crap - I mean knowledge).
Knowledge itself should become a human right. I understand that the current law is very favourable to mega-corporations milking mankind dry, but the law should also be changed. (I am not anti-business per se, mind you - I just think the law should not become a tool to contain human rights, including access to knowledge and information at all times.)
Wikipedia is somewhat ok, but it also misses a TON of stuff, and unfortunately it only has one primary view, whereas many things need some explanation before one can understand it. When I read up on a (to me) new topic, I try to focus on simple things and master these first. Some wikipedia articles are so complicated that even after staring at them for several minutes, and reading it, I still haven't the slightest clue what this is about. This is also a problem of wikipedia - as so many different people write things, it is sometimes super-hard to understand what wikipedia is trying to convey here.
Probably a bit of a 'baby with the bathwater' situation here. At almost no point has that institution been a net positive - at times snooping on 'political dissidents' (like MLK Jr.), and at others bungling cases so bad they become moments of national shame (Ruby Ridge).
You're never going to get a system with a clandestine domestic service running ethically for long, esp. not with qualified immunity. It's simply too attractive to dumb psychopaths with delusions of grandeur and concurrently not of interest to people with a strong sense of community or morals.
> At almost no point has that institution been a net positive
Hard to measure, isn't it. In the eyes of the millions of americans who have at some point in their life been victims or related to or friends of victims of some kind of serious crime, the FBI has often times been helpful and/or the prospect of being caught has been a deterrent for crimes.
You contrast that with all the bad that has come from there, of which there is surely plenty, but how come you claim thay the bad obviously must outweigh the good?
You're right that I'm taking a bit of a shortcut - my assessment is based on what I know to be true in both directions, the things they've done right versus the things they've done wrong. The CARD program, stopping the times square bomber, Don C. Miller, Zazi versus COINTELPRO, Stingray, MLK, Ruby Ridge, basically everything J Edgar Hoover ever touched (like the Palmer Raids), Steven Hatfill and Brandon Mayfield.
If you ask me, I'd trade the good for enduring the bad.
My shortcut is admittedly a sloppy heuristic (because what else do you have for unknowns like this); for the unmeasurable effects, my bet is that they skew roughly the same as the measurables. For every serial killer who thought twice, there have probably been many political activists who have also thought twice. The deterrent effect cuts both ways if your actions cut both ways. We also know about enough falsely accused / imprisoned that we can assume we ain't figured them all out. For every family that feels safer with the FBI around, there are families that feel less safe, because people "like them" have been framed, murdered, snooped on, suppressed, and criminalized.
So yeah, it is hard to measure - but not impossible to come to a conclusion, as far as I'm concerned.
Another way to look at it is this; if you're going to hand the mandate of violence and skullduggery to an institution, you should be damn sure that they have standards and practices that solidly enforce competence and ethics - and even considering the good, we know pretty conclusively that they have failed in this regard. I don't want to play russian roulette with law enforcement - they should get it right almost all of the time or step aside so someone who knows what they're doing can handle it.
If you choose to engage law enforcement personnel, it's "thank god, some extra protection" (hopefully!), but if there is a situation where law enforcement personnel engage you, it's either "huh?" or "oh fuck". This isn't different for the FBI than for local or state-level police.
If some law enforcement personnel show up that you didn't invite, they could be there for a large number of reasons. How worried you'll be depends on how likely you think they are to do what they're supposed to do instead of what they're not.
If they're canvassing for witnesses, are they going to charge through your yard and shoot your dog? If they're investigating someone else, how likely are they to try to come up with something unreasonable to charge you with for leverage and then make you plead it down to a penalty that still isn't zero in exchange for giving them information you might not even have and would then be forced to choose between fabricating to get the deal and "not cooperating" and getting a serious prison sentence?
If someone is attempting to SWAT you, how likely are they to ascertain the situation instead of shooting first and asking questions later?
If their investigation has led them to you for some reason even though you're innocent, do you expect them to care about the truth or just railroad you?
If you hear the name of a particular law enforcement agency unexpectedly when you don't have any reason to think you've done anything wrong and your instinct still has to be "oh fuck" then they're bad at their jobs.
I think most people would have essentially the same reaction to either FBI or state/local police showing up at their door with "[Police|FBI], open up!", and it depends more on whether they believe they've done something illegal than the reputation of the agency. This was my disagreement with GP(stavros).
Depending on how you expect the reader to answer all your questions, we could still be in full agreement, but my sense is that you're asking them rhetorically?
If this was true, the Miranda rights would read something like “anything you say will be used to obtain justice” rather than “anything you say can and WILL be used AGAINST you.” The police and justice system are never your friend. They are always your adversary, and should be treated as such. Under a different regime, they could be your ally if you’re innocent, (and this is the case in many countries) but in the US, they are always hostile to everyone, including innocent people. Even if individuals in that system don’t fancy themselves in that light.
I assume that’s why the original argument is that it’s not been a net positive. I.e. the assumption is that lots of work can be good and necessary, while even more that is evil and excessive can end up with a net negative.
Anticorruption work is good and necessary. If the FBI's work was any good, they would be investigating the funding of the destruction of the White House, or AIPAC and Qatari influence in DC, not Comey and Obama. Right now, they are working for Evil.
Or the Trump coin crypto rugpull and money laundering scheme. Or the open insider trading. Or the $400 million jet "gifted" from Qatar. This year has been one grift after another.
I already live there because the only enforcement that happens is trying to extract money from poor people to fund the local court and cops. Pulling over every car coming down a particular road and trying to charge them with DUIs for smoking weed 8 hours beforehand does not make me safer, it just makes me late for work and is used to justify tax increases on me to further fund the bogus drug war.
US law enforcement "clears" about 1 in 4 robberies and more than 1 in 3 aggravated assaults/batteries, and similar numbers for other crimes. On average, a criminal's career is 3 serious crimes. You can imagine how much awesomer your life would have been if they were able to run uncaught for years and years. But you won't because you have "net negative" bullshit blocking your vision.
typical bootlicker mentality; all criticism of state violence is rejected out of hand because the idea that power can and should be held to a higher standard is anathema to the authoritarian mindset.
I'm sure police (when they aren't fighting over jurisdictional issues) find it helpful, that doesn't mean that it's helpful for the population, especially when it (and police generally) are used as a tool for domestic influence operations and to basically shunt some people aside in the name of business and landowners.
i could absolutely be wrong since your post was kinda vague, so forgive me if i’m wrong, but are you implying we shouldn’t attempt mitigation of bad things because other bad things are happening elsewhere?
Deletion of data is the most permanent thing most people will ever do. The burning of the library of Alexandria and the razing of Baghdad left a long, long shadow on history.
Most of the time when I see this snark, and look it up, it turns out that the "original" inventor did only the most basic step or vague foundations and never refined it further or explored any potential applications.
Most often it happens with China since they spend a lot of propganda to present themselves as the true inventor of everything.
you jest but it is wild how often people declare “if it was ‘advanced’ and outside of Europe, it was probably aliens. not the people, it was aliens, obviously.”
Well, being ranked #275 on the list of 5 things I'm going to care about today means that I'm going to go "hmm that's interesting" and then move on with life.
If one is naive to the fire in the house, the egg on the counter, as something worthy of concern, might get them to look around and see that their house is not actually suitable for use as shelter. People who are trying to put out the fire (or who are simply concerned about it while watching from a distance) might decide to point to the spoiled egg to spread awareness of the fire to the people inside.
I think you have to change or abandon the metaphor to make your point. These are not true statements of spoiled eggs and house fires, and so much so as to make a reasonable claim about institutions and malfeasance look absurd.
> These are not true statements of spoiled eggs and house fires, and so much so as to make a reasonable claim about institutions and malfeasance look absurd.
True, but I disagree with the conclusion. When I try to map it back to reality and it doesn't make sense, it is indeed an indictment of the analogy. But the fact I have to abuse the analogy to make that mapping coherent is not my problem; it's not my analogy.
However, within the context of the analogy, and if one can imagine that absolutely insane scenario, the logic holds.
An egg gone bad and a house fire are not the same sort of concern. A government agency doing two types of evil thing is more akin to ranking whether the small fire in the living room is more important than the slightly larger fire in the media room. It’s all fire in your house.
the phrasing makes it sound like historically the FBI does not work for evil. which is somewhat annoying to someone who believes the FBI has been primarily a tool of evil.
I just wish the default way people used archive.is was to generate their long form link instead of their short link, as, if the site ever goes down, all of the links people have posted where they don't change the setting and thereby paste the default inscrutable code link will be destroyed... building a service with a pernicious behavior like that is ALSO not okay in its own way.
This actually seems like a big design flaw in resource locators. Perhaps someone here can make an alt DNS that resolves to new homes for content when the Canary dies.
IMO the natural right is for humans to share what they've learned up to and including verbatim reproductions of works by others. I also think that abridging this right to grant some exclusivity for artists (the broader "art" meaning scientists/writers/authors/musicians/coders/etc) is suitable. Copyright is/was a good idea. Its fair use clause is a good idea. The duration of exclusivity under current laws, however, seems excessive and beyond mere encouraging art.
There's no contradiction in wanting an abolition (or at least substantial curtailment) of copyright while also being upset that mass violations of copyright magically become legal if you've got enough money.
Enforcement being unjustly balanced in favor of the rich & powerful is a separate issue from whether there should be enforcement in the first place—"if we must do this, it should at least be fair, and if it's not going to be fair, it at least shouldn't be unfair in favor of the already-powerful" is a totally valid position to hold, while also believing, "however, ideally, we should just not do this in the first place".
> There's no contradiction in wanting an abolition (or at least substantial curtailment) of copyright while also being upset that mass violations of copyright magically become legal if you've got enough money.
Why can't you just be happy for those few who are lucky enough to be able to violate copyright with no consequences? Yes, I know you'd want everyone to be able to violate copyright, but we're not there yet.
"Why can't we just be happy" that individuals and smaller companies get sued into oblivion over copyright violations, while large AI companies can scrape everyone's data and use it for training and completely ignore copyright while generating code and images and text and music based on all that that displaces the demand for the originals? Is that what you're asking?
Because we’d like the powerful to feel the crunch from bad law rather than get a backdoor, so they have to use their power to change things for everyone instead of just getting it changed for themselves.
More often than not the rich just codify the "backdoor" for themselves in such case. A rich man can buy the $30,000 registered machinegun and pay the $200 NFA stamp and be 100% legal, the poor man who 3d prints a $0.50 of plastic to do the same thing goes to jail for 15 years.
The entities training AI are not anti-copyright, or anti-intellectual-property. If I were to steal their AI models they would sue me into the ground and probably win. Furthermore, even if you are anti-copyright, you probably still don't want your shit scraped by AI trainers since the bots are extremely aggressive, almost like a bona fide DDoS attack.
AI is not an attack on copyright, it is an attempt to replace it with something worse.
You're assuming way too much with "not there yet". The point is the corpos will violate copyright with impunity today, and then in a few years sign a bunch of settlement agreements and pull the ladder up behind them.
I'd love to see copyright slowly become irrelevant, but even with that goal we should expect to see large corpos being the last to stop respecting it.
Pirating an old movie to sell is not considered ethically problematic everywhere. In many, many countries on earth pirated DVDs were sold at the marketplace, and no one – buyer or seller – had qualms about it. When the authorities shut down such sales, it was almost entirely because they were being pressured by the USA and a handful of other Western governments, not because the local ethical perspective on this changed.
This genre of comment is so tedious. We aren't talking about everywhere, the FBI is a US agency, the big companies we're discussing have won in US court. This thread is about the US.
The FBI are police, not the entire American people. Courts are courts, not the entire American people. Among the American people, there are those who don’t find selling pirated media ethically problematic and would like to see the kind of marketplace sales and wide use of Bittorrent boxes that people in other countries have enjoyed. If something isn’t an ethical universal, that means people in your country can hold a pro-piracy view, too.
While it's true people are upset at AI companies profiting off of artist creations with no compensation, I know a lot of people are also reacting to how the recent AI companies have been scraping the web. The reason folks are using Anubis and other methods is because unlike Google which did have archiving of sites for a long time (which was actually a great service), these new companies do not respect robots.txt, do not crawl at a reasonable rate (for us, thousands of hits a minute from their botnets - usually baidu/tencent, but also plenty of US IPs), hit the same resource repeatedly, ignoring headers intended to give cache hints, stupidly hitting thousands of variations of a page when crawling search results with no detection that they are getting basically the same thing... And when you ban them, they then switch to residential ranges. It really is malicious.
If you boil it down to the AI companies are making money (subscriptions, etc.) based on content they did not pay to produce, then they are profiting from someone else's hard work.
Thats not entirely true. Google might or might not hide your pages from index. They'll definitely going to scrape it anyway. They also display summarized info from your page (famous "what is scrapping" joke showing wikipedias summary). Finally, you might just get your answer without visiting - just by skimming result description.
Well, don't we have enough Acme Corporations in the world that were unprofitable and existed purely on VC life support before they killed off all the competition by dumping the prices, and then made them skyrocket to recoup investments and become profitable after becoming monopolists?
People at these companies are receiving a salary to do these things that the person you're responding to is opposed to.
While not all the companies in question may or may not be profiting from these things some of them are, and most if not all of their employees certainly are as well.
I wish the companies would just pay a few technically-competent companies to do the scraping. Pay two so you can check their work, maybe, but let's get past the point in time when dozens (or more?) of companies are all simultaneously hammering the web.
My pie in the sky pitch is the US Government (and others) should solve this, the legality and the compensation problems in a single swoop. Make submission of your work to a federal model data set a requirement for obtaining copyright protection. License the data set (and heck maybe even charge for making custom models) for nominal fees to anyone who want it, with indemnification against copyright lawsuits for works deriving from the licensed model. Pay copyright owners a limited time royalty from these licensing fees. Everyone wins and we can stop needing a billion bots scraping a billion sites billion times a day.
While I would like to see it abolished entirely (including patents) I do have to compliment how you've described a formula that is actually possible to implement.
To deny people access to things is one thing, wanting to do it by impossible means is quite something else. Who even has time to scavage the universe looking for possible infringement on their works and also the money to deal with it?
A lot of the outrage isn't at scraping, it is at the disruptive techniques used to do so. Like web-scraping whole websites that already provide convenient images of their content for download.
Feels like now we're just redefining our rules so that the people we don't like are out and the people we like are in. Does the content creator have the right to determine how their work is used or not?
This is a false equivalency I'm surprised no one else has brought up. An archive of a site preserves attribution inherently, the scraping and training are not.
Is it? I thought it was ridiculous at first, but the more I think of it... both are scenarios where a corporation is scraping billions of webpages. We like the reason archive.is does it, but unless it's some kind of charity, I think it's a reasonable comparison.
archive.is is a charity no? Or at least they take donations, it seems the legal entity behind it is nebulous, but they don't have ads and have no paid product or offering.
They sure as shit do have ads. Have you ever accidentally followed a link using a browser profile that has no ad blocking enabled?
I only rarely browse without some form of content blocking (usually privacy-focused... that takes care of enough ads for me, most of the time). I keep a browser profile that's got no customizations at all, though, for verifying that bugs I see/want to report are not related to one of my extensions.
Every once in a while, I'll accidentally open a link to a news site (or to an archive of such a site) in that vanilla profile. I'm shocked at how many ads you see if you don't take some counter measures.
I just confirmed in that profile: archive.is definitely puts ads around the sites they've archived.
Corporations large and small don't do anything. It's always a person. The question you are answering, even if you don't think you are, is whether a few people can get together and act in concert and still retain their rights.
I imagine there's a whole lot of snarky epitaphs which the remnants of the humankind could place on this civilization's gravestone, but citing this exact law might make for the best one.
Only in a couple of very specific and narrow ways. They are not considered persons generally under US law. They are legal fictions that have been granted a subset of rights that people have.
It would solve a lot if that was taken to the extreme. Sorry Amazon, but your working conditions killed five people. Your business licens is going to jail for 40 years, good luck getting contracts with other companies with murder on your records when you get out.
Where US law applies varies by which law it is; there are US laws that apply only outside of the US [0], as well as US laws which have application both inside and outside the US.
[0] e.g., the federal torture statute, 18 U.S. Code § 2340A(a), “Whoever outside the United States commits or attempts to commit torture shall be fined under this title or imprisoned not more than 20 years, or both, and if death results to any person from conduct prohibited by this subsection, shall be punished by death or imprisoned for any term of years or for life.”
I mean, its not like it was just Biden. His extradition proceedings took place during three different US presidential administrations. You might as well include Trump and Obama in there as well.
It's not that they're scraping the internet, it's that they're scraping the internet, profiting off the data they take, and still using the copyright regime to go after others who do unto them.
Hot take here, I know, but some of us believe the law should treat large corporations differently than it treats individuals when it comes to their rights and privileges.
This seems like an incredible disingenuous take. There's a marked difference between collecting information to freely share with the rest of humanity, and collecting information to feed into algorithms under the guise of "artificial intelligence" with the pretense of enriching their finances and putting others out of work.
That's a bad take, just like open source code is available to all, it's not the case you can always resell it or repackage it for your own profit.
Information can be made available to all, and at the same time, we can make it so others cannot resell or repackage it for profit like what AI companies are doing.
> I actually think there should be a human right to data. With that I mean, primarily, knowledge, not to data about a single human being as such
How do you suggest we fund the difficult work needed to investigate, research, and produce such data?
Remember that facts are not copyrightable, and as such, can't be restricted by copyright. Creative expression of those facts, on the other hand, can be.
> Wikipedia is somewhat ok, but it also misses a TON of stuff, and unfortunately it only has one primary view, whereas many things need some explanation before one can understand it.
Last I checked, they had archive.is blacklisted; the people with power there had (as far as I can tell) come to the conclusion that people using that site to prove that websites had stated X on date Y were the bad guys. Of course, they still have archive.org sources everywhere, so the objection is not actually to archiving page content.
Tons of claims also seem to be sourced ultimately to thinly-disguised promotional material (e.g. claims of the prevalence of a problem backed up by the sites of companies offering products to combat the problem) and opinion pieces that happen to mention an objective (but not verified) claim in passing.
The difference is that we know who's running archive.org. We don't know who's running archive.is. That's perfectly fine for private use but unacceptable for a site like Wikipedia.
This type of possibility really worries me. Archive.is is much closer to actual history in many ways. If the data there starts getting corrupted or biased, there’s no way to know if what was truly there.
The idea that the permanent record of the internet could hinge on the ethics of one stranger behind a server rack is deeply unsettling.
I never said there are examples. However, "who you are" matters, even if you don't care. At least when the who is known, we can guage the trustworthiness and what bias exists, because there's always a bias. When who is not known, you don't know what bias to account for. That's not trustworthy and not reliable. And when the site is closed source and you have no idea how it's being run, nor by whom, you don't know "what you do" either.
I never concluded that but this actually allows someone, the anonymous here to change the history/info backwards if needed. For russians as an example this would be powerfull tool to manipulate narrative, which is cultural there at this point. Pretty smart and dangerous if it is really operated by them.
Ignoring the fact that one of this service's primary reason for existing is that corporations and governments are already abusing their ability to retroactively change history.
If this is truly a concern then the answer is to have more than one publicly-accessible independent archive service. Archivebro has never taken any steps towards securing a monopoly on archiving things. The FBI are the only ones doing that.
Also not everybody in Russia is on the FSB payroll. News media always stops investigating as soon as there is credible information that somebody or their server is located in Russia because if they learn too much then it becomes difficult to discredit them as "possibly being linked to the kremlin". If you used any other nationality to imply that somebody is acting in bad faith on behalf of a hostile foreign government without additional evidence those same journos would call you a racist and try to get you canceled.
I heard stories about a potential Oracle data breach (I think mainly affecting their customers) being removed from Archive.org too. It’s because in general, they comply with requests to remove stuff, which is understandable from an ethical perspective. But do they at least try to explain the reason for the takedown? Is it just not feasible to do that?
This is no longer true. They changed their policy to ignore robots.txt in 2017. I seem to recall that they still respected robots.txt later, though I can’t find any more information on it and may be misremembering. Currently, they do not.
My main use for archive.is is for sites that somehow cannot be archived (a message will show up mentioning this site cannot be archive or something along these lines).
archive.is is generally pretty good in forcibly attempting to get an archive, if the HTML doesn't work, the screenshot will work fine. Although archive.is doesn't seem to handle gifs/videos.
> Last I checked, they had archive.is blacklisted; the people with power there had (as far as I can tell) come to the conclusion that people using that site to prove that websites had stated X on date Y were the bad guys.
Or they're worried about the paywall by-passing functionality (which is probably what a good portion of people use it for) and copyright claims against archive.today potentially having it taken down and thus breaking a lot of links.
"Knowledge", for the most part is. What I see archive.is get used for most frequently is circumventing paywalls on paid-for media websites, which is journalism. And while freedom of the press is a constitutional right in functioning democracies, freedom of access isn't enshrined as much. But most of the things are background articles, the actual news is freely available to all still.
I'm all for archiving open webpages though. And I'm honestly surprised the Internet Archive is still standing. Their decision to opening up their book library was a dangerous mistake.
If the last 9 months have shown us anything, it's that long-running government institutions are a lot easier to kill than we thought. And the idea of archive.org being under the control an administration like the current one in the US is pretty frightening. They would have absolutely zero qualms about deleting and changing that data.
Archive dot org deleted a lot of stuff during their "hack" a while back. I'm convinced it's already been compromised. The US/EU/every government wants the ability to rewrite history.
Look up the article "Who Archives the Archivist?" (it's difficult to find. Use quotes. Don't link it; the site is banned here).
It's not up to us to tell the FBI what to do, that's a fatal misunderstanding about how power works. You can demand to see the FBI's manager, but I doubt it will get you anywhere. You can choose between two candidates offered by the privately owned and run political parties for whom the FBI works, but I don't think that will help either.
> Knowledge itself should become a human right.
Human rights are created by legislation. Unless you own a legislator (or rather, many legislators), you will not be involved in this. The people who own (and parcel out) knowledge itself, however, will be involved.
It would be better if we stopped making pronouncements about what people more powerful than us should be doing. It's like prisoners talking about what the jail should be doing. You should talk about what you should be doing. And don't mistake demanding for doing, or walking in the street with your friends for activism (unless you're violating curfew and are prepared to defend yourselves.)
Be brave. Put forward a program that might fail. Ask people to help you with it, ask them to follow you, tell them where to show up. Join someone else and help with their program. Don't demand, then whine when they say "of course not." The FBI is not your daddy, and the people running it are not running it on your behalf.
I don't mean to be personal, but this type of talk is empty. The way how to do things is decided is through power; and the way weak people exercise power is collectively, through discussion and coordinated action. Anybody can talk about what they would do if they were dictator of the world.
All you are guarding against here is some bits in a machine. Knowledge can be embedded in other substrate, other medium. Acquired by more actions than reading social media.
IMO what you really mean is "I should be free to sit and surf the web secure in my belief others are acting properly, while subsisting on externalized labor that props up my biology".
Asimov and countless others highlight this difference between being a passive reader of others ideas as orthogonal to knowledge acquisition. If you aren't conducting the experiments you acquired nothing but memory of someone else telling a story.
4% in the US hunt now. So to get people living rather than acquiescing, all you office drones are going to have to learn your way out of helplessness. Go acquire knowledge of how to grow a potato.
You won't because you don't want to acquire knowledge. You want the world to gift you knowledge and experience through as little effort of your own as possible. Typical American capitalist. 8 billion across the globe aren't that impressed by 300 millions obvious grift.
We tried making knowledge free and available to every online. Capitalists came and gobbled it up to sell back to us as "AI". Unfortunately we can't have nice things with people taking advantage of it.
> We need to preserve data. ... I actually think there should be a human right to data.
I'm not going to simp for the FBI here, but come on: do you have a human right to preserve my private photos leaked by a stalker or a hacker? Because archive.is is famously unwilling to play nice here.
I don't know if this case is about that, or about pirated content, or about the administration trying to scrub something embarrassing off the internet. But the fact that archive.is cheerfully enables all three "use cases" should probably give you a pause.
It's a delicate line to walk because takedown processes can be abused to do things we don't like. But "lol, tough luck, information wants to be free" is not a sensible blanket response in a polite society.
I don't know why all the archive sites don't share backups. The Wayback Machine and archive.is are the largest archive sites by far, and they don't share bulk downloads of the majority of the websites they catalog.
They of course don't have to, but having something like Anna's Archive but for website history would be great.
My guess is the sheer amount of data that archive.org has, which means:
- even higher costs associated with seeding archives (egress traffic, storage iops capacity required etc)
- chances of finding a 3rd-party seed for arbitrary file would be pretty slim, which means seeding on your own most of the time, which would make this hardly any better than offering files over HTTP only.
They pardoned the Silk Road drug lord to go after a copyright infringement-lord instead? It's not even in their effective jurisdiction, if this indeed is a Russian national. Don't they have more important Russian crimes to investigate?
I read there was a US government investigation tracking Ukranian children abducted by Russian forces, but supposedly there weren't enough resources [0] to sustain that.
> They pardoned the Silk Road drug lord to go after a copyright infringement-lord instead?
The president’s pardons are not popular with the FBI and law enforcement. The FBI is not happy about doing all of the work to prosecute people only to have the president override it for political reasons.
The reporting I've seen is that they were making efforts to get rid of anyone at FBI who would be upset at this. They are also reported to have an employment screening question that requires applicants to say the 2020 election was stolen.
That is not political, it is purely a service offered at a price. There is no specific political agenda behind these pardons (i.e., they don't pardon only folks who are, for example, Evangelicals or anti-immigration or whatever), the only criteria is payment.
> Politics (from Ancient Greek πολιτικά (politiká) 'affairs of the cities') is the set of activities that are associated with making decisions in groups, or other forms of power relations among individuals, such as the distribution of status or resources.
I don't know how much more obvious I can make this for you. Bribery is political.
Bribery is political. But it's not taken to be a usual part of politics in the West. (Similar to how the Roman word for ambush was the same as their word for treason. Treason isn't taken to be a usual part of politics. Ambush, for them, not a usual part of warfare.)
Basically, you're both right because what is and isn't political is itself a political question.
Word meanings evolve. Virtue literally means "manliness" in Classical Latin but only a pedantic dick would insist we use it in that sense. Polis and it's related words meant something different to the Greeks than they do to us.
Right, "Politics" evolved from "affairs of the cities" to "the set of activities that are associated with making decisions in groups, or other forms of power relations among individuals, such as the distribution of status or resources.".
Selling pardons for money is inherently a very political act. It means that you are aligning yourself with moneyed interests, which is clearly the heart of Trumpist politics. Setting ideology aside even, the open selling of pardons sends the message to the moneyed interests in general that he's on their side, even if they don't need a pardon at this exact moment. It serves both practical (get rich people to like you and therefore donate money to your campaigns and causes to help them succeed) and ideological (supply-sider-esque doctrine going back to at least the protestant reformation says that rich people should be in charge because they're rich, QED) purposes.
I am pretty sure that they only pardon people who are pro or at least neutral to Trump. I doubt that he would part on anybody who's an outspoken critic even if they offered him a bribe.
The hypothetical pardon was promised by then candidate-Trump in a speech at the Libertarian Party convention.
The specific political agenda was to get support from libertarians, who lean conservative, but don't like Trump much - because he rejects libertarianism.
That's as political as you can possibly get. It wasn't a behind the scenes thing. It was literally announced at a political convention.
I guess OP means to say it is not idealogical reasons.
Op means to say this type of pardon is not to meant to win votes or satisfy the demands of constituents, Like with convicted cops or people with weed related crimes etc or pardoning draft dodgers after Vietnam or civil war and so on .
While money is involved deeply in politics and financial corruption is there , occasionally idealogical (political) actions without direct financial benefits also happen.
It is hard to say whether this pardon of Silk Road founder was motivated by libertarian, or crypto community pressure or by financial donations to the party etc both are possible even at the same time but they are different considerations
> I guess OP means to say it is not idealogical reasons.
“Government exists for the personal benefit of the leader” (or simply “for my personal benefit as the leader”, with even less generalization beyond that) is an ideology, actually.
It’s not one that is popular to embrace publicly, but, that's hardly unique along real ideologies.
In 2019, Giuliani's assistant chided John Kirakou that pardons couldn't be discussed in his presence but that the fee was $1 million for Giuliani and $1 million for Trump. Given inflation, I'd bet that pardons now cost around $3 million.
You are talking about Kash Patel's FBI. The guy who has a hit song and book called "The Plot Against The King" pretending the 2020 election has been rigged and who maintains it to this day.
The FBI does what Trump tells them to do, that's it.
And why aren’t the people who “stole” the election being prosecuted by Trumps DOJ and FBI? He had proof remember? I wish liberal media would hammer this point and expose the lie for what it is.
They got me—a copyright infringement lord—too. The FBI profile assigned to me even wrote in a case study that the FBI thought I was making millions, amongst other misses.
Would love to hear more about this if you're inclined to share or have written about it somewhere. Legal contacts betweent he fedreal government and individuals are often surreal.
I read this, and found it to be a disappointing read. It had few details, and instead was more of a social sciences paper, covering basic ideas in academic language.
Roughly it seemed to be suggesting that:
* It's easier to deceive someone if they first solicit for help on a forum
* You can trick someone into revealing sensitive info like which infrastructure provider is used by nerdsniping them: "My mate thinks you should just enable health checking on AWS ELB", and then they reply "Well actually I use Hetzner". Except I'm guessing it was more elaborate than that.
I guess I wasn't the target audience of the article though.
joshmn, what did you think of the article?
Do you find it difficult to trust random commenters online now?
I see you mentioned you can't discuss technical details, but if/whenever that expires (?), that'd be great to hear.
I also found it underwhelming, though I'd like to think I’m the most scrutinizing of the subject matter. There's some nuance between my take on my behavior and the profiler's, but I'll give them the benefit of the doubt—they only had my Reddit posts to go on and had to package that for investigators.
I still tend to trust by default and make witty comments or jabs that sometimes land flat, so the article was accurate in that sense.
As for talking to the undercover, I made a point of keeping no secrets about my site's technical implementation. Between me and some "competitors," I was usually the first to respond to upstream provider changes—I'd even share my findings without expecting anything in return. Anyone could’ve asked about my issues, and I would've told them.
Trust is the most valued currency in the piracy world, and I worked hard to earn it with both peers and customers. Acting otherwise would've gone against that—and against my own morals. My being neurodivergent may also be worth noting in my willingness (or unwillingness from a free-will perspective) to trust others.
Technically speaking, the site worked by reverse-engineering the league's official streaming services—a few curl requests, careful observation of responses, and adapting them to my needs. There's more to it, of course, but my 2016 MVP was barely 50 lines of Ruby and a plain HTML file. TorrentFreak got some of the details right.
The US gov doesn’t even care about copyright infringement, just in the cases where big companies are inconvenienced by it and it’s done by an individual / small company instead of a mega AI corp swallowing up all copyrighted content to vomit out their own spin on it through algorithms.
Federal investigations tend to only go after big fish yes.
The root problem is the IP laws that congress passed. There will always be large pressure on law enforcement from the industry if you give them that leash.
Both are true. For a long time the Libertarian party was seen as drawing away small numbers of protest votes from the GOP, being populated by (mostly) guys who rejected Democratic over-regulation and nanny-statism but also rejected the GOP's anti-abortion politics, criminalization of drugs etc.
A few years ago there was an organized effort to capture key roles in the Libertarian party and focus the organization more on property rights and capitalism, with less emphasis on personal freedoms and constitutional limitations on government. This effectively split the Libertarian party, neutering it as electoral factor.
Now, the Libertarian party never mustered a large share of the vote, but many electoral contests are won at the margins. They managed to get ~3% of the vote in 2016, but lost >80% of that over the following 2 elections.
This seems more likely - how many libertarians are there in the US? Surely there are much larger groups you can appeal to if votes is what you're after
> Libertarians are like independents except noone wants to try to win us over
Because they are not like independents. Democrats have moved so far left that it's not even a question of who libertarians will vote for. The candidate just needs to show them a little attention so they remember to register and vote.
The Libertarian party got ~4.5 million votes in 2016. Getting some of those votes, or dissuading them from voting, is enough to make a difference in a tight race. See my other answer upthread for more context.
I don't think the crypto crowd was ever at hazard at not voting for Trump, so I'm not sure what the advantage would have been with respect to them. However, the libertarian crowd was.
As a libertarian voter, the pardon for Ross was the only thing Trump did that actually brought me pause. To the point, I felt immensely guilty for not voting for him when I voted (L) because I knew[thought] I was damning Ross to a jail cell. It weighed on my conscious for a long time after the vote, an it wasn't until Trump won I felt somewhat absolved of the guilt.
My personal opinion regarding the Ross pardon is that the Libertarian Party sold its soul for a donut. They could have gotten way more out of Trump than pardoning one particular Internet drug dealer.
Look, when Jeffrey Bezos and Larry the Lawnmower ask Trump for an FBI investigation and send him over another solid gold turd or whatever bribe is fitting for such a request, they expect results.
Oh please. Ross was no saint by any stretch and it does look like he may have made a very dark decision at one point, but it didn't happen in a vacuum. There's a mountain of details and nuance around that case, including a whole host of law enforcement abuses that many people would find distasteful if not sickening if they actually got the whole story.
Entrapment. The FBI posed as a user and convinced Ross that some people needed to be taken out, offered to do it, arrested him, dropped the murder-for-hire charges because they didn't want to play that game in court, knowing it would backfire, while still using those (unconvicted) charges to publicly smear him and influence the judge's decisions, and finally stole his Bitcoin.
Two agents went to prison over this. Those same agents have a history of fraud and abuse.
So -- I can't imagine how one could expect to run a massive drugs-and-arms bazaar and not go to jail forever for something or other. But. I think the surrounding circumstances gp's alluding to might have involved [0] and [1] (with a fairly colorful slant). I'm inclined to give a little weight to the colorful account since the agents in question actually went to jail for the massive theft; for a more neutral treatment, Justice [2] and Vice [3] cover the situation.
The basic claim being that the salacious murder-for-hire bit was 1) never tried or proven, and 2) was allegedly instigated in part by federal agents (operating out of unrelated offices) and a "mentor" of Ulbricht's. In reaction to one of the federal agents himself stealing $800,000 from the criminal enterprise for himself in the course of his investigation. Or something like that.
I'm not clear how that squares with Ulbricht going on to order five more imaginary executions, but the whole thing seems awfully sordid from every angle.
They didn't pay the bribe or tickle the diapered royal pink starfish properly. The Just Us system operates on the principle of favoritism with selective privilege/retribution rather than consistent fairness. They're perfectly fine having the DNI being a Russian mole and 47 rolling out the red carpet for a sanctioned war criminal.
In this day and age, MAANG, lacking integrity and values, bet on flattery and bribery as business expenses to ensure favorable treats instead of being punished.
Exactly. Now that the facade of being a country with integrity and equality is thoroughly shattered, good luck getting public support for shutting down a website that lets people read news for free. Shit's on fire yo; we got bigger problems than that.
IMO this is more an indictment of the president being able to pardon (any president, not just current one).
IMO the president should, at best, be an additional appeals round. (But probably just not involved in the Judicial because separation of powers is good)
It is pretty sad that this is happening and that it apparently is at risk of just disappearing soon. I understand there are a lot of ethical concerns with that site, but if I use like the Internet Archive's Wayback Machine to try to save some specific documentation pages for certain proprietary software, it absolutely fails to actually save the content. So then it is just a bit more difficult to save a particular knowledge base article before it might get rewritten or updated.
Dumb question: why do news websites have such a hard time keeping users logged in? Like I can go an entire year without getting logged out of gmail. But can't go more than a few days before getting logged out of news websites.
I have subscribed to news sites and still use something like archive.is because it is faster than my paid experience.
Someone should make a site like archive.is that runs the saved page through an LLM to summarize the main points, and perhaps extract a few critical quotes (unfortunately, at the LLM's discretion, but better than nothing). The law is their greatest enemy.
No, nobody should. Don't know why any trust the gossip slop bots given the extra work required.
Pretty brave to trust another word guessing bot when its unable to stop making stuff up.
Might want to ask it about Smith-Mundt 2012. Bill says discern, a lot, for a reason.
Since you refer to the darkweb. The gov has extensivley studied Tor and likely has zero day exploits for the Tor browser and operates a bunch of Tor relays. Given enough time and effort it is very much possible for state actors to identify Tor users.
But unless you are a high profile gov target, Tor protects you well.
Well it is of course not possible to 100% prove that Tor protects your privacy.
But the lack of evidence is also evidence. While we have evidence that the gov was able to identify indiviuals using Tor e.g. hosting drug portals, there has been no reports that individuals or companies are able to de-anonymize Tor users.
How often do we read articles about authorities attempting to unmask suspected criminals? That the FBI is trying to identify this person seems to be noteworthy to the point that there are many people here commenting on the story, and the fact that they are trying and have not yet seems to indicate that it is not a forgone conclusion.
I took the time to read that document a while back and it almost certainly isn't the correct guy. At the very least it provides 0 evidence other than concluding that "he must be the guy" due to his name, country of origin and programming background.
Yeah I just read through it and it presents absolutely no useful evidence. They establish that there's a developer in the US called Denis Petrov. They establish that someone involved with archive.today is often referred to as Denis Petrov. Then they make some weird leaps to conclude that they must be the same person.
A quick web search suggests Denis Petrov is not at all a unique name. Just because on of them wrote a somewhat feminist thought on a blog in 2004 and another forked a... let's call it "satirically feminist" project on GitHub does not in any way suggest they are the same person.
I’m not certain either way, but part of the document tries to make a big deal about some GitHub profiles having the “arctic code vault archive” badge, and implying that has something to do with running an archive website.
Pretty much anyone who has made any kind of commit to an open source project has that badge.
read the same PDF a year or so back when someone spammed it across the archive.is blog, laughed when i got to that bit - it's pretty clear the person writing it doesn't know anything about development
edit: it's incredibly naive of them to immediately trust the WHOIS results. i can say from experience that these are never checked
Looks like `archive.is` is currently using reCaptcha. So Google might be able to figure out and tell the FBI who runs it. (If not by data around the registration, then by data around accesses to the site that seem to be by a developer of it, coupled with their cross-site tracking data.)
I've also seen Cloudflare similarly in the loop, and they have similar cross-site tracking data.
Lesson: The same third-party tech surveillance companies to which you sell out all your visitors, can also violate you.
Coincidentally, their adoption of Google CAPTCHA (along with requiring javascript) is why I stopped using archive.today. I don't particularly want either of those entities executing mystery code in my browser, or on my computer at all.
Helping Google to collect records of my reading habits is also unappealing.
While strangely unpopular here, Yasha Levine's[0] well documented premise is that the entire existence of the internet is designed for surveillance and content control, down to the chip level, and this is mandated and enforced through laws as well as more covert agreements.
It's strangely unpopular because it's wrong in the one place techies care about: the details.
In broad strokes, it's true to say that the Internet was created as a surveillance and control tool. But this was not a big design up front with those goals as built-in capabilities. There's nothing in TCP/IP you can point to and say, "yes, this is the surveillance bit", or "yes, this is the government control bit". "Down to the chip level" is just plain wrong. Yes, you could argue that the Internet was enabling those things, but that's true of all communications technology, if not just the basic concept of human socialization[0].
And, in practice, if the US had actually intended for the Internet to be a surveillance and control tool, it was sure as shit really fucking bad at making use of it. The only country that actually realized it needed to censor the Internet to maintain cultural/social hegemony was China, which is why they got into network censorship early. By the time America realized it wanted that level of control it had to outsource the wetwork to creative industry and advertising companies.
[0] Most neurotypical people fail to recognize this.
> it's true to say that the Internet was created as a surveillance and control tool
I'd even argue against that, unless you got some strong evidence supporting such a claim. The mere presence of potentially traceable (IP) addresses is a technical necessity of point-to-point data transfer as contrasted to broadcast.
Techies really like to believe that they are building a bright future for humanity. Telling them that what they build is a high-tech concentration camp won't be received well.
This does not fit well with the current Cloudflare initiative.
Either the jurisdiction of a nation extend over its physical borders as long as there is a connection in digital space or it does not.
If the former, EU regulations do apply to American companies and they have to comply or leave the market and make sure that their offerings are not available here.
I discovered just yesterday that Verizon home internet blocks archive.is. Changing the router DNS from their default to openDNS fixed the problem for me, so it looks like they made only a nominal effort to block it.
This issue seemed to resolve itself sometime in the past year. I’m not sure if that’s because Cloudflare decided to surrender some of my PII in exchange for eDNS resolution or if archive.is finally stopped demanding it from them.
If I read that word salad correctly, Cloudflare says they're blocking it because they want to "protect the privacy" of users who do a DNS lookup of archive.today, to prevent the requester's IP address from being reviealed to archive.
That seems ludicrous, given that after a DNS lookup, the next thing anybody does is to send an HTTP request, which obviously reveals that same IP address to the archive servers.
So it's an obvious and blatant lie by Cloudflare, and I wonder what their real reason is.
Is there an easier way to get around all the complicated cookie selection? I don't care if they have 183 trackers. Do I need all those? Are the important to me? I suppose they are important to them. Isn't there just a 'no to all' or at least a 'just the bare minimum for state management'?
I will be devastated if this site gets taken down. I subscribe to pinboard.in for personal website bookmarking but even that is not 100% guaranteed to successfully cache a copy of the page.
they're fighting the wrong enemy. News content is such low quality, archive.is is the only enjoyable way to consume them. Their articles aren't worth wading through relentless popups.
IF I'm curious about a fact or story, it's chatgpt. if someone sends me a link , it's archive.is . When archive.is goes way, I'm never going to see a CNN, NYT, LAtimes/etc logo again.
I doubt this has anything to do with copyright law. I'm certain it has everything to do with certain things needing memoryholing and archive.* operators' lack of compliance.
And this very news site's settings are "Data processing by advertising providers including personalised advertising with profiling - Consent required for free use," funnily enough
This might not be about copyright. I generally avoid these mirror sites because they seem like the perfect opportunity for watering hole attacks. The challenge with a normal watering hole attack is that you have to control the site in question either by hacking or infiltrating it. Imagine however if you were able to act as a middle man to the most popular websites in the world, and people would voluntarily post links to your site all over the internet, including very valuable audiences (like HN). You would have free rein to selectively inject malware to just readers at targeted IP blocks, minimizing chances of detection because most users would never be served malware. The possibilities are endless, government espionage, corporate espionage, activists, political opponents.
To be clear I have no reason to believe specific instances of these sites are malicious, but I would be shocked if black hats weren't trying to get into this space in general.
For sure, you shouldn't just trust whatever random mirroring site pops up (in fact, you probably should trust almost none of them), but archive.is has established themselves pretty credibly IMHO. At some point it could turn, but I don't think we should kill them now just in case they turn at some point.
The fact that the FBI is involved, and given the insane amount of IP protection racket stuff going on, I think it's pretty highly likely this is all about copyright. I think the powerful interests care more about copyright than they do about most other things.
Maybe, but the subpoena doesn't shed light on what they are being investigated for. It is only demanding information.
The FBI could be investigating them for archive.today, they could be investigated because of that apparent botnet, they could be investigating them because some billionaire media mogul friend of the current POTUS is outraged at the loss of revenue. To the best of my knowledge, the reasons aren't public.
Still, it doesn't mean we shouldn't be asking questions or expressing concern over this.
> Another private investigation from 2024 comes to a different conclusion. It names a software developer from New York as the alleged operator. According to this investigation, following the trail to Eastern Europe proved to be a red herring.
Any pointers to what this "private investigation" is? The other linked blog pointing to Russia (or at least a Russian) seems pretty convincing:
It doesn't seem very convincing in its conclusions, but has some interesting information nonetheless. I searched for some info on this doc, and it seems that its author really did hire some private investigators, I even found gofundme for it and places where the author asked for help in the early stages of their investigation. It seems they were trying to find the website owner because he hadn’t responded to requests to delete some personal things archived by a prolific stalker.
> There are also indications that the operator(s) are based in Russia.
That's long been my assumption.
What I haven't known was whether this was good Russian people (culturally valuing literature and intellect) wanting to be able to access articles that they can't afford.
Nor whether it was or could become something sketchier (e.g., feeding spy databases, or one nice Chrome zero-day and strategic timing away from compromising engineering workstations at most US tech companies where an employee reads HN).
But what actually bothers me about the misc `archive.*` sites is how HN routinely uses them, for US tech company workers to circumvent paywalls for struggling journalism organizations. This piracy practice seems to have the unofficial blessing of the US tech investor firm that runs and moderates HN. Besides whatever laws this is breaking, subjectively, it feels to me like crossing an ethical line, and also (economically) like punching down.
I only very rarely read articles from these sites, and on these occasions I tend to get a lot of value out of it. I’d be happy to pay a dollar (or a couple bucks) per article. But I tend to forget about my subscriptions, and don’t read enough articles to justify it, so I’ll be wasting a lot of money buying it.
The problem with the paywalls is that everyone offers a subscription. If you want to read a single article you do not want to subscribe to some US newspaper.
Just because they don't 'want' to, doesn't mean it's not a good solution. Clearly they're not getting me to pay for their whole site subscription so why not just sell me the article? The biggest issue with most of these services is the lack of consumer 'ease' by which the creator can actually get paid. It's the same reason why I'm seeing all my friends go back to piracy, it was nice when we had a consumer convenient place to consume our content. Just look at Steam, it's easier than ever to go pirate the majority of games, but my Steam library just keeps getting bigger. I'm not against buying things, I'm against shitty services.
The internet is fundamentally different than print though—perhaps this fundamental change to journalism requires another way to pay the bills. (Advertising is the obvious one.)
Or maybe we, as a society (because of our internet ways) simply don't deserve these services any longer.
Perhaps the internet itself is the problem. What if instead that was the big mistake after all?
The FBI is attempting to unmask the owner behind archive.today, a popular archiving site that is also regularly used to bypass paywalls on the internet and to avoid sending traffic to the original publishers of web content, according to a subpoena posted by the website. The FBI subpoena says it is part of a criminal investigation, though it does not provide any details about what alleged crime is being investigated. Archive.today is also popularly known by several of its mirrors, including archive.is and archive.ph.
The subpoena, which was posted on X by archive.today on October 30, was sent by the FBI to Tucows, a popular Canadian domain registrar. It demands that Tucows give the FBI the “customer or subscriber name, address of service, and billing address” and other information about the “customer behind archive.today.”
“THE INFORMATION SOUGHT THROUGH THIS SUBPOENA RELATES TO A FEDERAL CRIMINAL INVESTIGATION BEING CONDUCTED BY THE FBI,” the subpoena says. “YOUR COMPANY IS REQUIRED TO FURNISH THIS INFORMATION. YOU ARE REQUESTED NOT TO DISCLOSE THE EXISTENCE OF THIS SUBPOENA INDEFINITELY AS ANY SUCH DISCLOSURE COULD INTERFERE WITH AN ONGOING INVESTIGATION AND ENFORCEMENT OF THE LAW.”
The subpoena also requests “Local and long distance telephone connection records (examples include: incoming and outgoing calls, push-to-talk, and SMS/MMS connection records); Means and source of payment (including any credit card or bank account number); Records of session times and duration for Internet connectivity; Telephone or Instrument number (including IMEI, IMSI, UFMI, and ESN) and/or other customer/subscriber number(s) used to identify customer/subscriber, including any temporarily assigned network address (including Internet Protocol addresses); Types of service used (e.g. push-to-talk, text, three-way calling, email services, cloud computing, gaming services, etc.)”
> YOU ARE REQUESTED NOT TO DISCLOSE THE EXISTENCE OF THIS SUBPOENA INDEFINITELY AS ANY SUCH DISCLOSURE COULD INTERFERE WITH AN ONGOING INVESTIGATION AND ENFORCEMENT OF THE LAW.
Is this actually a mere request, as in the receiver is _not_ required to avoid disclosure?
Isn't the whole thing a request? The FBI has no power in Canada unless they go through Canadian legal channels, no? If I received a subpoena from a foreign sovereign I would just use it as toilet paper.
They probably cannot require this. They may be able to get you on interfering with their investigation if you disclosed with the intent of interfering. Probably adding this notice helps them prove you were aware of the potential to cause interference, at least. IANAL.
As a Canadian, I really hope Tucows is going to send a particularly nasty response to the FBI. Canada should never collaborate with any US authorities!
> Canada should never collaborate with any US authorities!
Cross-border collaboration is a good thing. Our agencies regularly collaborate to bring people who feel insulted and emboldened to account for their crimes. This works both ways.
As someone who has dealt with media of me as a minor (~around 11/yo) from Omegle being shared across the internet, the role archivers play in keeping illegal content “alive” isn’t well recognized. Thankfully, the Internet Archive has a matured process to purge pages that host illegal content.
We do not know what the investigation is for. All is up to speculation. Not all investigations are bad.
Here is an example on archive.is. I submitted multiple complaints to NCMEC but didn’t get results. Germany, though, was able to get the archives purged.
> In response to a request we received from 'jugendschutz.net' the page is not currently available.
That page held many, many images of minors. It is good that it is gone.
August 12, 2025 - Canadian Man Sentenced to 188 Months for Attempted Online Enticement of a Minor and Possessing Child Pornography [1]
August 21, 2024 - Canadian National Extradited To The United States Pleads Guilty To Production Of Child Sex Abuse Material And Enticement Of Minors
December 20, 2024 - Extradited Canadian National Sentenced To Life In Federal Prison for producing child sexual abuse material and enticement of a minor [3]
IMO it's only a good thing when it's a good thing. There are plenty of reasons it could be a bad thing too. For example, Edward Snowden probably would have been hung by now if russia cross-border collaborated.
One can only imagine the sharing and reading histories the operator has accumulated on the people using it. No restrictions on how the operator can use that data
Records of every URL submitted and accessed by a given IP address+browser fingerprint^1
1. Archive.today sites use a form of "pixel tracking" to collect IP addresses via popular graphical browsers, the ones that are required to be used to solve CAPTCHAs, that automatically request URLs in HTML tags with the "src" attribute, e.g., "img" tags
2. Archive.today sites serve CAPTCHAs to some users^3 which force them to enable Javascript and share browser information
3. For example, those users not using or appearing to use popular graphical browsers
What histories? Does archive.is take your email, phone, credit card, and passport pic when you want to read anything? The most there is is just an IP address in the server logs, for most users, rotated by their ISP on regular basis, easily obscured with a VPN.
This need to make IP-infringement sound ominous by invoking some ill-defined spy plot is a tired cliche.
There are many mechanisms, widely used, to aggregate information from many sources into a profile of you, and using your IP as an identifier isn't hard. Many lawsuits find their targets based on IP addresses, for example.
> easily obscured with a VPN
I think we can expect that commercial VPNs are compromised, at least by intelligence services. Imagine you opened a bar and advertised, 'dissidents come here to drink in privacy'. I'm sure you'd attract others too to an obviously target-rich environment.
When the proxy operator is (1) anonymous, (2) the potential target of coercion, and (3) outside the user's jurisdiction, how does a proxy user verify (a) whether the operator collects, or is being forced to collect, data or (b) what it might do, or be forced to do, with colllected data
I'm concerned that, even here on HN, we are underestimating both the magnitude of this looming conflict and also its inevitable conclusion.
The internet is bigger than you and me, and it's bigger than computers. It is an evolutionary force. It is not going to be stopped, and certainly not by states whose popularity and authority are waning so rapidly.
Moreover, the internet keeps as its core function the proclivity to copy and store bytes, and from this very simple mechanism emerges a large set of tools and norms that supplant nearly all of the ways that nation states perpetuate power.
What we desperately need are the elder statesmen and women to stand up and soberly see the writing on the wall, and gracefully deprecate the systems of which they will soon lose control, starting with nuclear weapons.
I don't believe this has to end in violence or acrimony of any kind. But we have run out of time to act like petulant children, crying that somebody took our empire away.
Very soon (ie, in the next couple centuries, maybe sooner), a few small nation states will adopt frameworks of zero IP, allowing all the content of the internet to be housed there, and from there, accessed by the entire planet.
Some other nation states may attempt some kind of embargo or sanctions, but these will obviously fail, just as the attempts by Russia and China to censor the internet within their borders have failed (and are failing with greater volume with each passing day). And before you cry that adoption in China is too low to support this conclusion, consider that the work to resist the GFW has begat some of the best networking tools in the world, with rapidity of evolution increasing, not decreasing. Even if fear and violence can stem adoption for a few decades or even centuries, the toolchain continues to grow and will tip the scales over sufficiently long time scales.
Let's not let this become a world information war. Let's install peace now while it's still easy to do. Let's dispense with IP and live in a world of joyous open access to information.
>Archive.today launches real browsers (not even headless) and tries to load lazy images, unroll folded content, login into accounts if prompted with login form, remove “subscribe our maillist” modals
There are some tricks which work for different websites - for example, for NYT it's enough to manually clear nytimes.com cookies, FT used to work after click from twitter/x and so on. So I guess there is some set of heuristics.
It seems that archive.is often has the full article for sites that are completely paywalled to every non-paying visitor: no cookie-driven freebies, nothing.
Publicly revealing everything they are doing would be a strategically bad idea, obviously.
It's not inconceivable that they actually pay for access to some of the sites; it wouldn't be surprising.
They are not actually bypassing firewalls - therefore I think they are on ethically good grounds. Those sites show their full text for web crawlers - only not to humans. Basically, archive.is and the folk simulate that through various means. Headless browsers, better agent injection etc.
I always assumed so - because Google can index them full text. It used to be the case that you could see those full snapshots in Google as cache - this was before the sites strong armed Google to remove those snapshots from being accessible, then archive.* folk rose to power. You can test this yourself for searching for a unique quote on those sites and still getting hits in Google. But you are right - why could this not be achieved with a plugin then - don't know.
Cloudflare's DNS actually hasn't worked with archive.today for >5 years, due to the site returning bad results in response to Cloudflare not sending EDNS subnet info. HN comment from someone at Cloudflare: https://news.ycombinator.com/item?id=19828702
> Archive.is’s authoritative DNS servers return bad results to 1.1.1.1 when we query them. I’ve proposed we just fix it on our end but our team, quite rightly, said that too would violate the integrity of DNS and the privacy and security promises we made to our users when we launched the service.
> The archive.is owner has explained that he returns bad results to us because we don’t pass along the EDNS subnet information. This information leaks information about a requester’s IP and, in turn, sacrifices the privacy of users. This is especially problematic as we work to encrypt more DNS traffic since the request from Resolver to Authoritative DNS is typically unencrypted. We’re aware of real world examples where nationstate actors have monitored EDNS subnet information to track individuals, which was part of the motivation for the privacy and security policies of 1.1.1.1.
"Infamous"? About as infamous as heise.de. Weird framing. Many people do not like the past being available for reference when they lie about in the future. And that's what this federal attack stems from.
"who controls the past controls the future: who controls the present controls the past"
If you are OK with getting your information from one or two sources - why not. You can also subscribe to a newspaper. But surely internet can do (and did until relatively recently!) better than that.
Paywall epidemic is a recent phenomenon, internet media managed to exist before that.
Oh no, how will so many people on HN fish for karma if they can’t contribute to conversations with an archive.whatever url that takes 2 seconds to generate on your own?
Not if it isn't available in my country, I can't. Personally, I'm grateful if somebody on here provides an archive link to an article I otherwise cannot read without additional technical intervention. To label it karma fishing is really uncharitable.
The FBI is conspiring against the owner of archive.is
I feel bad for the owner. He must be telling his friends "The FBI is out to get me" and they must think he's insane and they try to get him institutionalized...
The psychiatrist will note "Patient has delusions of grandeur; he thinks he is the owner of Wayback Machine and that the CIA is after him. Diagnosis: Paranoid Schizophrenia"
The FBI should investigate the "AI" companies and also the demise of Suchir Balaji, a copyright whistleblower who according to a sloppy local police investigation committed "suicide" hours after being seen cheerfully collecting a doordash delivery on CCTV.
I'm so confused by this power dynamics. So you can torrent movies just fine from Meta/Google office in Germany? No matter how you look at it, the White House has holes in the walls as in Idiocracy.
I hope someone once does a deep dive when archive.org was taken down for a few weeks by hackers from a "pro-Palestinian" group. It felt like a black propaganda attack, especially with the very tame videos they shared on social media about the crimes their enemy committed (videos of buildings being blown up instead of innocent children).
It could be for something far more petty, like covering up speaking gaffes. When you give petty image-obsessed people a lot of power, they’ll use it for petty, image-preserving reasons.
As someone who has been the target of an FBI investigation for what was effectively criminal copyright infringement (later arrested and did time in prison), my only takeaway is that this, if anything, should just be a civil suit just like so many other similar cases of copyright issues.
In my personal experience, the priorities of the FBI are typically highly politically motivated. The exceptions are if you’re doing something seriously icky, or doing fraud that deceives people.
For those interested in what’s reported and what actually happens, I’ve made some comments on my case and my experience here: https://prison.josh.mn
>There's a certain freedom in owning your story publicly. People can't weaponize what you've already made peace with. I think that's what I'm motivated to do here.
Really nice. It also builds some credibility currency, the reputation economy is not as punitive in your case as I thought it would be.
I don't get it. You are linking an article which just says that you've deleted the original article.
Here's your actual account: https://news.ycombinator.com/item?id=45451567
edit: apparently also here: https://prison.josh.mn/self
The wording of the landing page makes it sound (at least to me!) like the content is no longer there.
I was confused at first too - the story is in sections accessed at the top
Just here to say that's a banger of an url prison josh
That was a great read
FBI wants to remind everyone that only US mega-cap companies can scrape the entire internet, not share the data, use it to train AI models and then charge people to use chatbots that use this 'laundered' data. Anybody else attempting to do this is a criminal in their eyes and must be punished.
Archive should just rebrand as an AI start up then offer an 'llm' that is suspiciously 'over-trained' and happens to spit out the site you query exactly... Copyright infringement? Nay! Over-training! "A fix is coming soon™!"
See NYT vs OpenAI
Not just that... they want to control and erase history, so they fully control the narrative. Same as the Roman Empire did with the crusades and burning of books.
I thought the crusades came after the end of the Roman Empire
The FBI is also used by various organizations to investigate various crimes, whether real or not. For instance, the Bureau of Industry and Security thought my company was flaunting export regulations and had the FBI raid my company to investigate. It turned out to largely be due to a paperwork problem, but the BIS didn't have the power to investigate so they contracted the FBI to provide the manpower to do the raid. So who we saw was the FBI, but it was really the BIS originating the raid.
Just a note: the White House also uses archive.ph.
Search for “Americans are spending like never before: Retail sales are booming — up 5% over last year, far outpacing inflation — as Americans spend in record amounts.” [1]
The phrase “up 5%” links directly to archive.ph.
[1] https://www.whitehouse.gov/articles/2025/09/the-economy-is-b...
How did YOU find that out ?
You are asking dangerous questions my friend :) (yeah pretty impressive catch, maybe some llm-assisted cross-scan of gov sites)
Why would it be LLM-assisted when maps of what sites link where are part of the core WWW infrastructure? Google made a trillion dollar business out of that.
I occasionally read these articles and wanted to know what sources they use, besides websites like The Daily Caller, to back up their claims. I noticed this some time ago and remembered it. But it took me a while to find the article again. ;)
your general google search with operators or advanced search will do, however I won't guarantee google doesn't use a LLM in the background.
But what reason might the whitehouse have to deprive reuters of traffic in such a petty way? /s
I pay subscriptions to some of these sites and still use archive.is on them because it is a more pleasant reading experience. No auth failures, no annoying popover windows begging me to subscribe to their dumb newsletter. Just the internet equivalent of a static piece of newsprint.
my personal theory is that archive.is has paid subscription accounts (legit or via botnet) to most of the major news outlets and edits the html to make the sites look not logged in. I wonder if they do it by hand or by doing something like : https://github.com/pirate/html-private-set-intersection
in my experience it's just a headless browser with a bypass-paywalls extension
It is definitely more than that for some sites and it has to be manually managed. For example this year i've seen archive.is capture paid articles of some finnish newspapers and the layout gives away that it is logged in on an account although the identifying details have been stripped out.
There have been periods of weeks/months when they don't have paid access to those Finnish sites. Tried it just now on a hs.fi paid article from today and it didn't work, but for example paid articles from just a week ago seem to have been captured as a premium user.
It is curious how they have time to do it and I wonder if news sites of other smaller languages get similar treatment.
Create vpn on a GCP ip address, use googlebot user agent, paywalls gone
Probably works against a fair few sites, but not if they are using RDNS.
Sharing articles I like to people I know is the other reason, when the site doesn't have any functionality for that.
I used to do the same with Lynx but enough websites have now broken it.
ublock with annoyance filters also solves this
Is there any more annoying popup than the newsletter popup? I'd rather see a targeted ad than that BS.
NO! I do not want your newsletter! I wouldn't even have an email address if it wasn't absolutely required to operate in society today. The less email I get, the better!
Email is becoming like fax machines: An old, dated technology that refuses to die.
> Is there any more annoying popup than the newsletter popup?
Rhetorical I'm sure, but actually yes! The popup that tries to get you to switch to the app when you are actively trying to give the offender money! eBay is a notable offender here (it pops up when I search for stuff to buy; why would you interrupt that?)
Personally I don't mind an offer to subscribe to the newsletter but Substack is way to aggressive. They show the prompt even before I have finished the article (How do I know if I want to subscribe?) and obscure the article (actively working against what they know what I am trying to do). So I now just immediately back out when I see that. I won't visit sites that are purposely harming the experience.
Physical feels that way to me sometimes. In the US, I get assaulted on a constant basis by mailers and ads for things I never expressed any interest in. Waste of time, waste of paper, waste of resources.
Sites used to open up popup windows, then browsers got better at blocking them. But instead of taking the hint that people didn’t want that, they just moved the popups inside the page content. The prevalence of ad / tracking blockers is entirely due to the user-hostile actions of site owners.
The purpose of creating a site is to earn money, and newsletter popup is a step in a sale funnel. Text is just a bait.
Same, I also use links from HN threads and I donated to archive.is in the past. I don't want them gone.
When there are a few simple nice things making our lives a little bit more bearable, there are always other zealous assholes desperate to ruin that.
Here I speak about this site, but everyday we have new cases of that. Like "new tax on anything that starts to be popular" for France, or Google trying to kill our privacy and F-Droid by requiring all app devs to have attestation from them.
Or the Anna's archive DNS block in European countries...
https://archive.is/XdQRp
So interesting that with this link, I saw the whole article is a couple paragraphs. With the original link, I gave up after the second ad that almost covered my whole screen on mobile. Too many ads are a terrible user experience that doesn’t let us read anything.
Something new I found - go to any link (news article) on commondreams.org in Firefox. Now use the Reader View button in your address bar - they've figured out how to hack Reader View to not show the content and only a beg screen for money.
I use Brave on iOS, it mostly works and blocks ads and cookie pop ups. (Sometimes it breaks some websites but it's trivial to disable the shields.)
Ublock Origin, disable javascript for that site, remember the decision. Problem solved.
I also resorted to using archive.is when I first visited the page so that I wouldn't need to agree to their data collection for personalization.
chef's kiss
how coincidental
For the last month or so archive.is is not working for me, is it maybe related to this? Btw I always assumed owner is from Russia because he or they were so secretive about everything except occasional blog post or occasional Q&A and Russians are usually obsessed by their web pet projects.
For real someone needs to make legit business of archiving the web where you would have timestamped hashes of your archived web pages and "unlimited" storage for archiving ofc only if you pay for the "unlimited" storage.
The subpoena cites the following statute as authorization: "(1)(A) In any investigation of (i)(I) a Federal health care offense; or (II) a Federal offense involving the sexual exploitation or abuse of children, the Attorney General; or (ii) an offense under section 871 or 879, or a threat against a person protected by the United States Secret Service under paragraph Secret Service determines that the threat constituting the offense or the threat against the person protected is imminent"
One of the agents named in the subpoena appears to have previously worked on child exploitation cases years ago:
https://www.supremecourt.gov/DocketPDF/22/22-6039/245948/202...
Now that might be an interesting angle.
1. Put up CSAM on your unlisted domain briefly.
2. Archive page and delete site.
3. Send people archive link.
I can confirm that (something similar to that) is what regularly occurs to share CSAM.
Have dealt with content from when I was ~11 on Omegle appearing across the internet for years at this point (NCMEC is an amazing resource).
Archive sites are regularly abused by bad actors.
Here is real example on archive.is:
https://archive.is/https://ezgif.com/maker/*
I submitted multiple complaints to NCMEC but didn’t get results. Germany, though, was able to get the archives purged.
On the page, you will see the text:
> In response to a request we received from 'jugendschutz.net' the page is not currently available.
I think owner mentioned in a blog post (or on twitter?) this is indeed happening, but I forgot the exact wording to google it.
UPD Found this by googling "site:blog.archive.today abuse":
https://blog.archive.today/post/117011183286/yesterday-i-did... (2015)
That seems like something that should be handled with a simple takedown request and those behind archive.is would almost certainly comply. 99.999% of people using archive.is are using it to bypass news article paywalls nothing more. Which, if we're honest, is the real reason why the FBI is going after them.
Personal anecdote but I almost never use these archive sites to bypass paywalls. I only use it when I want to see how establishment news sites somehow sometimes accidentally tell the truth, then, when they get the call, they try to purge their original reporting. Again, it might be my personal bias, but in my opinion, this is the main reason they are going after them. Because these websites let people prove the hypocrisy and the lies.
I remember that when[0] Reuters took down that one story about organized crime, and further DMCA'd the Internet Archive to take down their version, archive.ORG cheerfully did the memory-hole thing—while archive.IS stayed up.
If the (Western) internet were to turn into a monoculture of Western-domiciled big corporations, that kind of censorship would be *effective*. Our systems aren't robust against bad-faith actors attacking the free flow of information. (And the root cause of the planet-spanning censorship cascade in that example was, unambigiously, bad actors. A crime syndicate based in India).
The fact the internet is global and freely connects to legal jurisdictions and cultures very different from the West's, is to the West's benefit: it creates an escape-hatch for things that fall between the cracks of our nascent totalitarian technologies.
[0] https://news.ycombinator.com/item?id=39065981#39065996 ("A Judge in India Prevented Americans from Seeing a Blockbuster Report")
If the feds want to take someone down, one of the dirty tricks is to use CSAM as a pretext for an investigation and subpoenas.
I get freaked out when I consider the future of archive.is. Thanks to the nature of the web today, it is incredibly fragile.
As the co-creator of a censorship-resistant publishing platform, I really wish we would migrate to a peer-to-peer technology. We could develop network effects on a decentralized platform with a cryptographically-provable network of trust. Most people don't realize it is possible to handle media distribution in a robust way.
I'm not just trying to shill my solution! I wish there were more competitors using these techniques to try and save the web.
Except a lot of people wouldn't participate in a peer-to-peer network for fear of legal repercussions.
Utilizing p2p tech is not illegal. It is illegal to redistribute copyrighted content without authorization- and we are working to build this into the protocol so that peers will respect copyright by default. People can redistribute at their own risk. I'll be the first to admit that this is complicated, and we have a long way to go in this regard.
Plus, the vast majority of people will just use the web frontend, with a peer on the server. Most peers can be hosted by content creators and tech-savvy friends+family.
Almost every machine in the world participates in at least one peer-to-peer network: Windows Update. There was a time when the Steam client also used bittorrent technology, not sure if they still do.
Obviously P2P gets used in various things, my point was just, that (most) people likely won't willingly join P2P networks to fight "censorship" or help archive things with questionable content or tainted with potential copyright infringements.
We need to preserve data. The FBI is trying to kill data.
We can not allow the FBI to work for Evil here. I actually think there should be a human right to data. With that I mean, primarily, knowledge, not to data about a single human being as such (e. g. "doxxing" or any such crap - I mean knowledge).
Knowledge itself should become a human right. I understand that the current law is very favourable to mega-corporations milking mankind dry, but the law should also be changed. (I am not anti-business per se, mind you - I just think the law should not become a tool to contain human rights, including access to knowledge and information at all times.)
Wikipedia is somewhat ok, but it also misses a TON of stuff, and unfortunately it only has one primary view, whereas many things need some explanation before one can understand it. When I read up on a (to me) new topic, I try to focus on simple things and master these first. Some wikipedia articles are so complicated that even after staring at them for several minutes, and reading it, I still haven't the slightest clue what this is about. This is also a problem of wikipedia - as so many different people write things, it is sometimes super-hard to understand what wikipedia is trying to convey here.
> We can not allow the FBI to work for Evil here
Historically speaking I can't see this as even being in the top 100 evil things the FBI has done.
> Historically speaking I can't see this as even being in the top 100 evil things the FBI has done.
Perhaps, but we can't change the past: we can only fight against what is happening in the present to try to get a better future.
Probably a bit of a 'baby with the bathwater' situation here. At almost no point has that institution been a net positive - at times snooping on 'political dissidents' (like MLK Jr.), and at others bungling cases so bad they become moments of national shame (Ruby Ridge).
You're never going to get a system with a clandestine domestic service running ethically for long, esp. not with qualified immunity. It's simply too attractive to dumb psychopaths with delusions of grandeur and concurrently not of interest to people with a strong sense of community or morals.
> At almost no point has that institution been a net positive
Hard to measure, isn't it. In the eyes of the millions of americans who have at some point in their life been victims or related to or friends of victims of some kind of serious crime, the FBI has often times been helpful and/or the prospect of being caught has been a deterrent for crimes.
You contrast that with all the bad that has come from there, of which there is surely plenty, but how come you claim thay the bad obviously must outweigh the good?
You're right that I'm taking a bit of a shortcut - my assessment is based on what I know to be true in both directions, the things they've done right versus the things they've done wrong. The CARD program, stopping the times square bomber, Don C. Miller, Zazi versus COINTELPRO, Stingray, MLK, Ruby Ridge, basically everything J Edgar Hoover ever touched (like the Palmer Raids), Steven Hatfill and Brandon Mayfield.
If you ask me, I'd trade the good for enduring the bad.
My shortcut is admittedly a sloppy heuristic (because what else do you have for unknowns like this); for the unmeasurable effects, my bet is that they skew roughly the same as the measurables. For every serial killer who thought twice, there have probably been many political activists who have also thought twice. The deterrent effect cuts both ways if your actions cut both ways. We also know about enough falsely accused / imprisoned that we can assume we ain't figured them all out. For every family that feels safer with the FBI around, there are families that feel less safe, because people "like them" have been framed, murdered, snooped on, suppressed, and criminalized.
So yeah, it is hard to measure - but not impossible to come to a conclusion, as far as I'm concerned.
Another way to look at it is this; if you're going to hand the mandate of violence and skullduggery to an institution, you should be damn sure that they have standards and practices that solidly enforce competence and ethics - and even considering the good, we know pretty conclusively that they have failed in this regard. I don't want to play russian roulette with law enforcement - they should get it right almost all of the time or step aside so someone who knows what they're doing can handle it.
Well, when you hear a knock on the door and someone say "FBI, open up", do you think "thank god, some extra protection", or "oh fuck"?
If you choose to engage law enforcement personnel, it's "thank god, some extra protection" (hopefully!), but if there is a situation where law enforcement personnel engage you, it's either "huh?" or "oh fuck". This isn't different for the FBI than for local or state-level police.
If some law enforcement personnel show up that you didn't invite, they could be there for a large number of reasons. How worried you'll be depends on how likely you think they are to do what they're supposed to do instead of what they're not.
If they're canvassing for witnesses, are they going to charge through your yard and shoot your dog? If they're investigating someone else, how likely are they to try to come up with something unreasonable to charge you with for leverage and then make you plead it down to a penalty that still isn't zero in exchange for giving them information you might not even have and would then be forced to choose between fabricating to get the deal and "not cooperating" and getting a serious prison sentence?
If someone is attempting to SWAT you, how likely are they to ascertain the situation instead of shooting first and asking questions later?
If their investigation has led them to you for some reason even though you're innocent, do you expect them to care about the truth or just railroad you?
If you hear the name of a particular law enforcement agency unexpectedly when you don't have any reason to think you've done anything wrong and your instinct still has to be "oh fuck" then they're bad at their jobs.
I think most people would have essentially the same reaction to either FBI or state/local police showing up at their door with "[Police|FBI], open up!", and it depends more on whether they believe they've done something illegal than the reputation of the agency. This was my disagreement with GP(stavros).
Depending on how you expect the reader to answer all your questions, we could still be in full agreement, but my sense is that you're asking them rhetorically?
If this was true, the Miranda rights would read something like “anything you say will be used to obtain justice” rather than “anything you say can and WILL be used AGAINST you.” The police and justice system are never your friend. They are always your adversary, and should be treated as such. Under a different regime, they could be your ally if you’re innocent, (and this is the case in many countries) but in the US, they are always hostile to everyone, including innocent people. Even if individuals in that system don’t fancy themselves in that light.
> At almost no point has that institution been a net positive
The FBI's anticorruption work is good and necessary.
I assume that’s why the original argument is that it’s not been a net positive. I.e. the assumption is that lots of work can be good and necessary, while even more that is evil and excessive can end up with a net negative.
Anticorruption work is good and necessary. If the FBI's work was any good, they would be investigating the funding of the destruction of the White House, or AIPAC and Qatari influence in DC, not Comey and Obama. Right now, they are working for Evil.
Like that's happening under this administration, see tom Homan.
Or the Trump coin crypto rugpull and money laundering scheme. Or the open insider trading. Or the $400 million jet "gifted" from Qatar. This year has been one grift after another.
[flagged]
Have you ever dealt with US law enforcement? They are a joke. Thinking they are a positive influence is a joke to me.
Why don't you then move to a place where law enforcement doesn't exist at all? Surely must be like paradise!
I already live there because the only enforcement that happens is trying to extract money from poor people to fund the local court and cops. Pulling over every car coming down a particular road and trying to charge them with DUIs for smoking weed 8 hours beforehand does not make me safer, it just makes me late for work and is used to justify tax increases on me to further fund the bogus drug war.
US law enforcement "clears" about 1 in 4 robberies and more than 1 in 3 aggravated assaults/batteries, and similar numbers for other crimes. On average, a criminal's career is 3 serious crimes. You can imagine how much awesomer your life would have been if they were able to run uncaught for years and years. But you won't because you have "net negative" bullshit blocking your vision.
well because intelligence agencies and drug lords take over due to more force and money.
typical bootlicker mentality; all criticism of state violence is rejected out of hand because the idea that power can and should be held to a higher standard is anathema to the authoritarian mindset.
un-nuanced and intellectually lazy.
I'm all for reasonable criticism of law enforcement, which "net negative" is not.
The laughable thing here is the "argument" that one cannot judge the societal impact of the FBI unless one has worked in law enforcement.
I'm sure police (when they aren't fighting over jurisdictional issues) find it helpful, that doesn't mean that it's helpful for the population, especially when it (and police generally) are used as a tool for domestic influence operations and to basically shunt some people aside in the name of business and landowners.
Ok then. This, while bad, is not even in the top 50 of evil deeds the FBI is CURRENTLY doing ...
i’m not entirely sure what you’re implying here.
i could absolutely be wrong since your post was kinda vague, so forgive me if i’m wrong, but are you implying we shouldn’t attempt mitigation of bad things because other bad things are happening elsewhere?
List not fifty, but just ten of those, I'd like to know.
Deletion of data is the most permanent thing most people will ever do. The burning of the library of Alexandria and the razing of Baghdad left a long, long shadow on history.
But after that we got to pretend everything was invented in Western Europe.
Most of the time when I see this snark, and look it up, it turns out that the "original" inventor did only the most basic step or vague foundations and never refined it further or explored any potential applications.
Most often it happens with China since they spend a lot of propganda to present themselves as the true inventor of everything.
you jest but it is wild how often people declare “if it was ‘advanced’ and outside of Europe, it was probably aliens. not the people, it was aliens, obviously.”
But once FBI has the power to erase knowledge, the other 100 evil things will be rounded to zero.
It being #101 or #275 in a ranking doesn’t make it not evil though.
Well, being ranked #275 on the list of 5 things I'm going to care about today means that I'm going to go "hmm that's interesting" and then move on with life.
The tendency to do evil things should be the overriding concern, not the relative ranking of them.
We shouldn't let people get away with doing minor bad things just because they're not as bad as other bad things we let them get away with.
No, but it can make it unworthy of concern, same how an egg gone bad on the kitchen counter is not very important when the house is on fire
If one is naive to the fire in the house, the egg on the counter, as something worthy of concern, might get them to look around and see that their house is not actually suitable for use as shelter. People who are trying to put out the fire (or who are simply concerned about it while watching from a distance) might decide to point to the spoiled egg to spread awareness of the fire to the people inside.
I think you have to change or abandon the metaphor to make your point. These are not true statements of spoiled eggs and house fires, and so much so as to make a reasonable claim about institutions and malfeasance look absurd.
> These are not true statements of spoiled eggs and house fires, and so much so as to make a reasonable claim about institutions and malfeasance look absurd.
True, but I disagree with the conclusion. When I try to map it back to reality and it doesn't make sense, it is indeed an indictment of the analogy. But the fact I have to abuse the analogy to make that mapping coherent is not my problem; it's not my analogy.
However, within the context of the analogy, and if one can imagine that absolutely insane scenario, the logic holds.
An egg gone bad and a house fire are not the same sort of concern. A government agency doing two types of evil thing is more akin to ranking whether the small fire in the living room is more important than the slightly larger fire in the media room. It’s all fire in your house.
Will probably make my life personally worse than the others.
And?
the phrasing makes it sound like historically the FBI does not work for evil. which is somewhat annoying to someone who believes the FBI has been primarily a tool of evil.
I just wish the default way people used archive.is was to generate their long form link instead of their short link, as, if the site ever goes down, all of the links people have posted where they don't change the setting and thereby paste the default inscrutable code link will be destroyed... building a service with a pernicious behavior like that is ALSO not okay in its own way.
This actually seems like a big design flaw in resource locators. Perhaps someone here can make an alt DNS that resolves to new homes for content when the Canary dies.
where can you change the setting? I was not aware of this.
At the top click "share" and then select the "long link".
http://archive.today/2023.11.30-020758/https://www.theverge....
> Knowledge itself should become a human right.
IMO the natural right is for humans to share what they've learned up to and including verbatim reproductions of works by others. I also think that abridging this right to grant some exclusivity for artists (the broader "art" meaning scientists/writers/authors/musicians/coders/etc) is suitable. Copyright is/was a good idea. Its fair use clause is a good idea. The duration of exclusivity under current laws, however, seems excessive and beyond mere encouraging art.
Per the constitution copyright was meant to encourage the progress of the arts and sciences. Whenever and if ever it does the opposite it has failed.
Many rights holders would very much like us to forget this.
That happened when Congress made copyrights last more than 10 years.
I agree. Knowledge should belong to all of humanity.
But then also don’t be angry at big corporations when they scrape the entire internet.
There's no contradiction in wanting an abolition (or at least substantial curtailment) of copyright while also being upset that mass violations of copyright magically become legal if you've got enough money.
Enforcement being unjustly balanced in favor of the rich & powerful is a separate issue from whether there should be enforcement in the first place—"if we must do this, it should at least be fair, and if it's not going to be fair, it at least shouldn't be unfair in favor of the already-powerful" is a totally valid position to hold, while also believing, "however, ideally, we should just not do this in the first place".
> There's no contradiction in wanting an abolition (or at least substantial curtailment) of copyright while also being upset that mass violations of copyright magically become legal if you've got enough money.
Why can't you just be happy for those few who are lucky enough to be able to violate copyright with no consequences? Yes, I know you'd want everyone to be able to violate copyright, but we're not there yet.
"Why can't we just be happy" that individuals and smaller companies get sued into oblivion over copyright violations, while large AI companies can scrape everyone's data and use it for training and completely ignore copyright while generating code and images and text and music based on all that that displaces the demand for the originals? Is that what you're asking?
Because we’d like the powerful to feel the crunch from bad law rather than get a backdoor, so they have to use their power to change things for everyone instead of just getting it changed for themselves.
More often than not the rich just codify the "backdoor" for themselves in such case. A rich man can buy the $30,000 registered machinegun and pay the $200 NFA stamp and be 100% legal, the poor man who 3d prints a $0.50 of plastic to do the same thing goes to jail for 15 years.
The entities training AI are not anti-copyright, or anti-intellectual-property. If I were to steal their AI models they would sue me into the ground and probably win. Furthermore, even if you are anti-copyright, you probably still don't want your shit scraped by AI trainers since the bots are extremely aggressive, almost like a bona fide DDoS attack.
AI is not an attack on copyright, it is an attempt to replace it with something worse.
He might be happy for them and also sad because there is no rule of law.
Its the rule of bribery quite frankly. Name it lobbying and nobody bats an eye on it.
You're assuming way too much with "not there yet". The point is the corpos will violate copyright with impunity today, and then in a few years sign a bunch of settlement agreements and pull the ladder up behind them.
I'd love to see copyright slowly become irrelevant, but even with that goal we should expect to see large corpos being the last to stop respecting it.
It's not so simple.
There are violations of copyright which are ethically fine, i.e. pirating an old movie to watch.
Then there are violations of copyright which are ethically problematic, i.e. pirating an old movie to sell.
When a big company violates copyright the nature of the violation is always much closer to the latter.
Pirating an old movie to sell is not considered ethically problematic everywhere. In many, many countries on earth pirated DVDs were sold at the marketplace, and no one – buyer or seller – had qualms about it. When the authorities shut down such sales, it was almost entirely because they were being pressured by the USA and a handful of other Western governments, not because the local ethical perspective on this changed.
This genre of comment is so tedious. We aren't talking about everywhere, the FBI is a US agency, the big companies we're discussing have won in US court. This thread is about the US.
The FBI are police, not the entire American people. Courts are courts, not the entire American people. Among the American people, there are those who don’t find selling pirated media ethically problematic and would like to see the kind of marketplace sales and wide use of Bittorrent boxes that people in other countries have enjoyed. If something isn’t an ethical universal, that means people in your country can hold a pro-piracy view, too.
While it's true people are upset at AI companies profiting off of artist creations with no compensation, I know a lot of people are also reacting to how the recent AI companies have been scraping the web. The reason folks are using Anubis and other methods is because unlike Google which did have archiving of sites for a long time (which was actually a great service), these new companies do not respect robots.txt, do not crawl at a reasonable rate (for us, thousands of hits a minute from their botnets - usually baidu/tencent, but also plenty of US IPs), hit the same resource repeatedly, ignoring headers intended to give cache hints, stupidly hitting thousands of variations of a page when crawling search results with no detection that they are getting basically the same thing... And when you ban them, they then switch to residential ranges. It really is malicious.
> AI companies profiting
Are they?
If you boil it down to the AI companies are making money (subscriptions, etc.) based on content they did not pay to produce, then they are profiting from someone else's hard work.
If you boil it down anyone on this planet can access ChatGPT (and other for profit LLMs) for free to answer any question they might have.
Knowledge is shared among humanity at a rapid speed. Everyone benefits.
It’s mind boggling how anyone could be opposed to that.
Revenue is not profit.
I didn't say it was. I understand that profit = revenue - cost.
I said they're profiting from other people's hard work, a separate concept.
'stealing is fine if you lose money when reselling'
I don't believe I wrote anything of the sort.
profiting != profit
Do you use google search for work? Then you are profiting from someone else's hard work!
The difference is that when you search on Google - at least before AI overviews - you end up at the source site.
Also Google respects robots.txt. Every site that Google surfaces chose to be in the index.
Thats not entirely true. Google might or might not hide your pages from index. They'll definitely going to scrape it anyway. They also display summarized info from your page (famous "what is scrapping" joke showing wikipedias summary). Finally, you might just get your answer without visiting - just by skimming result description.
Well, don't we have enough Acme Corporations in the world that were unprofitable and existed purely on VC life support before they killed off all the competition by dumping the prices, and then made them skyrocket to recoup investments and become profitable after becoming monopolists?
People at these companies are receiving a salary to do these things that the person you're responding to is opposed to.
While not all the companies in question may or may not be profiting from these things some of them are, and most if not all of their employees certainly are as well.
I don't care that they scrape my website.
I DO care that nearly 2M different IPs are used to try to pull 42k commits out of a git repo one by one when they could just git clone it ...
I wish the companies would just pay a few technically-competent companies to do the scraping. Pay two so you can check their work, maybe, but let's get past the point in time when dozens (or more?) of companies are all simultaneously hammering the web.
My pie in the sky pitch is the US Government (and others) should solve this, the legality and the compensation problems in a single swoop. Make submission of your work to a federal model data set a requirement for obtaining copyright protection. License the data set (and heck maybe even charge for making custom models) for nominal fees to anyone who want it, with indemnification against copyright lawsuits for works deriving from the licensed model. Pay copyright owners a limited time royalty from these licensing fees. Everyone wins and we can stop needing a billion bots scraping a billion sites billion times a day.
While I would like to see it abolished entirely (including patents) I do have to compliment how you've described a formula that is actually possible to implement.
To deny people access to things is one thing, wanting to do it by impossible means is quite something else. Who even has time to scavage the universe looking for possible infringement on their works and also the money to deal with it?
There’s perfectly good LLM’s built specifically to shit out swaths of mediocre code to do that, why would you pay anyone?
A lot of the outrage isn't at scraping, it is at the disruptive techniques used to do so. Like web-scraping whole websites that already provide convenient images of their content for download.
Feels like now we're just redefining our rules so that the people we don't like are out and the people we like are in. Does the content creator have the right to determine how their work is used or not?
I have a right to my copyrighted work, and I also have a right to set and enforce access rules to a server I operate to grant people access to it.
This is a false equivalency I'm surprised no one else has brought up. An archive of a site preserves attribution inherently, the scraping and training are not.
Is it? I thought it was ridiculous at first, but the more I think of it... both are scenarios where a corporation is scraping billions of webpages. We like the reason archive.is does it, but unless it's some kind of charity, I think it's a reasonable comparison.
archive.is is a charity no? Or at least they take donations, it seems the legal entity behind it is nebulous, but they don't have ads and have no paid product or offering.
They sure as shit do have ads. Have you ever accidentally followed a link using a browser profile that has no ad blocking enabled?
I only rarely browse without some form of content blocking (usually privacy-focused... that takes care of enough ads for me, most of the time). I keep a browser profile that's got no customizations at all, though, for verifying that bugs I see/want to report are not related to one of my extensions.
Every once in a while, I'll accidentally open a link to a news site (or to an archive of such a site) in that vanilla profile. I'm shocked at how many ads you see if you don't take some counter measures.
I just confirmed in that profile: archive.is definitely puts ads around the sites they've archived.
So if OpenAI or <AI scraper of the day> adds attribution to their AI-generated answers, everything is OK?
It would be closer to okay.
Copyright exists to "promote the Progress of Science and useful Arts."
Anything which does that should be legal, and anything that stifles those advances should not.
Big corporations aren't humans.
Corporations large and small don't do anything. It's always a person. The question you are answering, even if you don't think you are, is whether a few people can get together and act in concert and still retain their rights.
they are persons under US law.
I imagine there's a whole lot of snarky epitaphs which the remnants of the humankind could place on this civilization's gravestone, but citing this exact law might make for the best one.
Then one should be able to put them on death row under US law.
Only in a couple of very specific and narrow ways. They are not considered persons generally under US law. They are legal fictions that have been granted a subset of rights that people have.
And that subset of rights keeps expanding.
If you're looking to US law to discern who is a person and who is not you are deeply lost.
It would solve a lot if that was taken to the extreme. Sorry Amazon, but your working conditions killed five people. Your business licens is going to jail for 40 years, good luck getting contracts with other companies with murder on your records when you get out.
US law only applies in the US. Plus, the company in question seems to be based in Canada, so outside the FBI jurisdiction
> US law only applies in the US.
Where US law applies varies by which law it is; there are US laws that apply only outside of the US [0], as well as US laws which have application both inside and outside the US.
[0] e.g., the federal torture statute, 18 U.S. Code § 2340A(a), “Whoever outside the United States commits or attempts to commit torture shall be fined under this title or imprisoned not more than 20 years, or both, and if death results to any person from conduct prohibited by this subsection, shall be punished by death or imprisoned for any term of years or for life.”
https://www.law.cornell.edu/uscode/text/18/2340A
People pretend the GDPR applies to everyone, so why not the DMCA?
GDPR applies to you if you come to the EU to peddle your wares.
DCMA absolutely does apply to European firms selling their goods and services in the US.
that's just wishful thinking. US law applies world wide as long as Trump is willing to reach out and nab you. ask the Venezuelan fishermen.
That's not even US law, it's straight up murder outside of the law.
So was Kim Dot Com. Biden went after him anyway at the behest of big media.
I mean, its not like it was just Biden. His extradition proceedings took place during three different US presidential administrations. You might as well include Trump and Obama in there as well.
But US law isn't even the law of the world, let alone the definition of reality.
How are you planning on doing anything about it?
Does it need to be? Trump has been assasinating people in boats with no evidence whatsoever.
One thing is not the other. A corporation is not a human (and no I don't care what Citizens United says). A corporation has no inherent rights.
It's not that they're scraping the internet, it's that they're scraping the internet, profiting off the data they take, and still using the copyright regime to go after others who do unto them.
Hot take here, I know, but some of us believe the law should treat large corporations differently than it treats individuals when it comes to their rights and privileges.
> But then also don’t be angry at big corporations when they scrape the entire internet.
I'm only angry with them when they pay hush money to IP extortionists.
This seems like an incredible disingenuous take. There's a marked difference between collecting information to freely share with the rest of humanity, and collecting information to feed into algorithms under the guise of "artificial intelligence" with the pretense of enriching their finances and putting others out of work.
Anyone on this planet can access ChatGPT (and other for profit LLMs) for free to answer any question they might have.
This is true knowledge socialism.
- ChatGPT is not about knowledge.
- ChatGPT is in the "bait" phase of "bait and switch" plan. It is trying to make us dependent on it, so that it can extract maximum profit later.
we (all of us) do not own chatgpt; we (all of us) do not share in the profit from chatgpt; this is not what socialism is.
That's a bad take, just like open source code is available to all, it's not the case you can always resell it or repackage it for your own profit.
Information can be made available to all, and at the same time, we can make it so others cannot resell or repackage it for profit like what AI companies are doing.
You can sell open source code for profit.
https://www.gnu.org/philosophy/selling.en.html
Well, as long as they pursue a "copyright for me but not for thee" regime, you can.
> I actually think there should be a human right to data. With that I mean, primarily, knowledge, not to data about a single human being as such
How do you suggest we fund the difficult work needed to investigate, research, and produce such data?
Remember that facts are not copyrightable, and as such, can't be restricted by copyright. Creative expression of those facts, on the other hand, can be.
> The FBI is trying to kill data.
For you. I'm sure they love data as long as only they can access it.
> Wikipedia is somewhat ok, but it also misses a TON of stuff, and unfortunately it only has one primary view, whereas many things need some explanation before one can understand it.
Last I checked, they had archive.is blacklisted; the people with power there had (as far as I can tell) come to the conclusion that people using that site to prove that websites had stated X on date Y were the bad guys. Of course, they still have archive.org sources everywhere, so the objection is not actually to archiving page content.
Tons of claims also seem to be sourced ultimately to thinly-disguised promotional material (e.g. claims of the prevalence of a problem backed up by the sites of companies offering products to combat the problem) and opinion pieces that happen to mention an objective (but not verified) claim in passing.
> they had archive.is blacklisted
What do you mean by this? Wikipedia actively encourages people to use archive.is links in citations:
https://en.wikipedia.org/wiki/Help:Archiving_a_source#Archiv...
Well, it was last I checked, which may have been some time ago (I couldn't easily find the changes with the WikiBlame tool).
That would be 7 years ago. It indeed was blacklisted on 2014 but restored on 2016.
2016 was 9 years ago, but thanks for the heads-up.
The difference is that we know who's running archive.org. We don't know who's running archive.is. That's perfectly fine for private use but unacceptable for a site like Wikipedia.
It's not that difficult.
Are there examples of archive.is falsifying information? I care more about "what you do" than "who you are".
When you don't know who runs archive.is, you can also never know if he sold it to someone else (except if either party publically announces it).
This type of possibility really worries me. Archive.is is much closer to actual history in many ways. If the data there starts getting corrupted or biased, there’s no way to know if what was truly there.
The idea that the permanent record of the internet could hinge on the ethics of one stranger behind a server rack is deeply unsettling.
This happens on .gov websites a lot now, and it is deeply unsettling.
Knowing the current owner of archive.is doesn't help; we need more full, independent Internet mirrors that can be compared against each other.
Archive.is does not archive the page exactly as is.
And when the information on archive.is starts getting corrupted, those links can be adjusted or removed.
Who will catch it, when and how?
I never said there are examples. However, "who you are" matters, even if you don't care. At least when the who is known, we can guage the trustworthiness and what bias exists, because there's always a bias. When who is not known, you don't know what bias to account for. That's not trustworthy and not reliable. And when the site is closed source and you have no idea how it's being run, nor by whom, you don't know "what you do" either.
Does wiki only allow people to vote sources where the owner is known?
"wiki"?
We also know archive.org has removed pages some "important" people didn't like.
I never concluded that but this actually allows someone, the anonymous here to change the history/info backwards if needed. For russians as an example this would be powerfull tool to manipulate narrative, which is cultural there at this point. Pretty smart and dangerous if it is really operated by them.
Such practices are by no means limited to Russia.
Absolutely. They we're just example which came to my mind first.
Also happens to likely be where the site owner is based
https://gyrovague.com/2023/08/05/archive-today-on-the-trail-...
Ignoring the fact that one of this service's primary reason for existing is that corporations and governments are already abusing their ability to retroactively change history.
If this is truly a concern then the answer is to have more than one publicly-accessible independent archive service. Archivebro has never taken any steps towards securing a monopoly on archiving things. The FBI are the only ones doing that.
Also not everybody in Russia is on the FSB payroll. News media always stops investigating as soon as there is credible information that somebody or their server is located in Russia because if they learn too much then it becomes difficult to discredit them as "possibly being linked to the kremlin". If you used any other nationality to imply that somebody is acting in bad faith on behalf of a hostile foreign government without additional evidence those same journos would call you a racist and try to get you canceled.
> We don't know who's running archive.is.
I always assumed it was officially sanctioned by the Israeli government, for some unspecified (but almost certainly nefarious) purpose.
Not sure if your assumption was based on the TLD, but `.is` is Iceland, not Israel (which is `.il`).
Ty to everyone who pointed this out. I have a feeling I've googled it several times, and then promptly forgot. Maybe it will stick this time.
.is is Iceland, not Israel (that's .il).
Why Israel? From the TLD? IS is Iceland. Israel would be IL.
[flagged]
Where did you check this? While neither are listed on WP:RSP, I know many cites are changed into web.archive.org links once they go down.
I heard stories of incriminating stuff for higher-ups disappearing from archive.org.
I heard stories about a potential Oracle data breach (I think mainly affecting their customers) being removed from Archive.org too. It’s because in general, they comply with requests to remove stuff, which is understandable from an ethical perspective. But do they at least try to explain the reason for the takedown? Is it just not feasible to do that?
Archive.org honors robots.txt retroactively. So anybody can take down their own stuff by adding a link to their robots.txt file.
This is no longer true. They changed their policy to ignore robots.txt in 2017. I seem to recall that they still respected robots.txt later, though I can’t find any more information on it and may be misremembering. Currently, they do not.
Does it mean archive.org works for any sites?
My main use for archive.is is for sites that somehow cannot be archived (a message will show up mentioning this site cannot be archive or something along these lines).
archive.is is generally pretty good in forcibly attempting to get an archive, if the HTML doesn't work, the screenshot will work fine. Although archive.is doesn't seem to handle gifs/videos.
> Does it mean archive.org works for any sites?
They respected exclusion requests after they stopped to respect robots.txt. I don't know their policy for new exclusion requests.
Oh. Did not know that.
> Last I checked, they had archive.is blacklisted; the people with power there had (as far as I can tell) come to the conclusion that people using that site to prove that websites had stated X on date Y were the bad guys.
Or they're worried about the paywall by-passing functionality (which is probably what a good portion of people use it for) and copyright claims against archive.today potentially having it taken down and thus breaking a lot of links.
"Knowledge", for the most part is. What I see archive.is get used for most frequently is circumventing paywalls on paid-for media websites, which is journalism. And while freedom of the press is a constitutional right in functioning democracies, freedom of access isn't enshrined as much. But most of the things are background articles, the actual news is freely available to all still.
I'm all for archiving open webpages though. And I'm honestly surprised the Internet Archive is still standing. Their decision to opening up their book library was a dangerous mistake.
I've been thinking the Library of Congress should buy Archive.org
It should become part of an entity that is very difficult to kill, and will exist for a long time.
Although I guess that's a function of culture, and I think respect for libraries is rapidly declining.
If the last 9 months have shown us anything, it's that long-running government institutions are a lot easier to kill than we thought. And the idea of archive.org being under the control an administration like the current one in the US is pretty frightening. They would have absolutely zero qualms about deleting and changing that data.
Ditto for the other side of the aisle. We still don't know who was really the president, while everyone pretended Biden was not a dementia patient.
Totally agreed. They are just less obvious about it.
I’d rather not rely on a government owning it that has been very open about their desire to control what people perceive as true
Don't make it another American echo pool. Nobody needs more of that
All right, one of the other countries that cares about free speech can buy it ;)
How about no country needs to own something like this. And if one does there are dozen of western countries with better freedom of speech record.
Archive dot org deleted a lot of stuff during their "hack" a while back. I'm convinced it's already been compromised. The US/EU/every government wants the ability to rewrite history.
Look up the article "Who Archives the Archivist?" (it's difficult to find. Use quotes. Don't link it; the site is banned here).
> We can not allow the FBI to work for Evil here.
It's not up to us to tell the FBI what to do, that's a fatal misunderstanding about how power works. You can demand to see the FBI's manager, but I doubt it will get you anywhere. You can choose between two candidates offered by the privately owned and run political parties for whom the FBI works, but I don't think that will help either.
> Knowledge itself should become a human right.
Human rights are created by legislation. Unless you own a legislator (or rather, many legislators), you will not be involved in this. The people who own (and parcel out) knowledge itself, however, will be involved.
It would be better if we stopped making pronouncements about what people more powerful than us should be doing. It's like prisoners talking about what the jail should be doing. You should talk about what you should be doing. And don't mistake demanding for doing, or walking in the street with your friends for activism (unless you're violating curfew and are prepared to defend yourselves.)
Be brave. Put forward a program that might fail. Ask people to help you with it, ask them to follow you, tell them where to show up. Join someone else and help with their program. Don't demand, then whine when they say "of course not." The FBI is not your daddy, and the people running it are not running it on your behalf.
I don't mean to be personal, but this type of talk is empty. The way how to do things is decided is through power; and the way weak people exercise power is collectively, through discussion and coordinated action. Anybody can talk about what they would do if they were dictator of the world.
> We can not allow the FBI to work for Evil here.
The whole US is evil.
"I actually think there should be a human right to data."
Interesting tagline, but probably has far too many side effects.
If it were me, I would try to boil this down to some negative right, those usually have less side effects, and even then I would be very careful.
It's not any less evil than when the past administration colluded with Amazon, Apple, and Google to erase Parler off the Internet.
You made your bed. Now lie in it.
I no longer support the freedom of speech for people and groups that actively call for the removal of these righta from people they don't like.
I read the discussion on HN when this all went down. I won't forget.
All you are guarding against here is some bits in a machine. Knowledge can be embedded in other substrate, other medium. Acquired by more actions than reading social media.
IMO what you really mean is "I should be free to sit and surf the web secure in my belief others are acting properly, while subsisting on externalized labor that props up my biology".
Asimov and countless others highlight this difference between being a passive reader of others ideas as orthogonal to knowledge acquisition. If you aren't conducting the experiments you acquired nothing but memory of someone else telling a story.
4% in the US hunt now. So to get people living rather than acquiescing, all you office drones are going to have to learn your way out of helplessness. Go acquire knowledge of how to grow a potato.
You won't because you don't want to acquire knowledge. You want the world to gift you knowledge and experience through as little effort of your own as possible. Typical American capitalist. 8 billion across the globe aren't that impressed by 300 millions obvious grift.
We tried making knowledge free and available to every online. Capitalists came and gobbled it up to sell back to us as "AI". Unfortunately we can't have nice things with people taking advantage of it.
> We need to preserve data. ... I actually think there should be a human right to data.
I'm not going to simp for the FBI here, but come on: do you have a human right to preserve my private photos leaked by a stalker or a hacker? Because archive.is is famously unwilling to play nice here.
I don't know if this case is about that, or about pirated content, or about the administration trying to scrub something embarrassing off the internet. But the fact that archive.is cheerfully enables all three "use cases" should probably give you a pause.
It's a delicate line to walk because takedown processes can be abused to do things we don't like. But "lol, tough luck, information wants to be free" is not a sensible blanket response in a polite society.
https://archive.ph/XdQRp
thank you!
Is there a dump of all archive.is sites (similar to libgen dumps) in case it goes down, so it could be set back up again?
If only archive.is would share a dump of its archives through torrents the way Anna’s Archive does — it’d make it much more resilient.
I don't know why all the archive sites don't share backups. The Wayback Machine and archive.is are the largest archive sites by far, and they don't share bulk downloads of the majority of the websites they catalog.
They of course don't have to, but having something like Anna's Archive but for website history would be great.
My guess is the sheer amount of data that archive.org has, which means:
- even higher costs associated with seeding archives (egress traffic, storage iops capacity required etc)
- chances of finding a 3rd-party seed for arbitrary file would be pretty slim, which means seeding on your own most of the time, which would make this hardly any better than offering files over HTTP only.
Anna's Archive, after all, is only an index.
They pardoned the Silk Road drug lord to go after a copyright infringement-lord instead? It's not even in their effective jurisdiction, if this indeed is a Russian national. Don't they have more important Russian crimes to investigate?
I read there was a US government investigation tracking Ukranian children abducted by Russian forces, but supposedly there weren't enough resources [0] to sustain that.
[0] https://www.npr.org/2025/03/19/nx-s1-5333328/trump-admin-cut...
> They pardoned the Silk Road drug lord to go after a copyright infringement-lord instead?
The president’s pardons are not popular with the FBI and law enforcement. The FBI is not happy about doing all of the work to prosecute people only to have the president override it for political reasons.
The reporting I've seen is that they were making efforts to get rid of anyone at FBI who would be upset at this. They are also reported to have an employment screening question that requires applicants to say the 2020 election was stolen.
Scary stuff! Do you perhaps have a link to this reporting so that we can read up further on this?
Just imagine how happy thy are about this
I don't think it is political reasons, seems like it is for large donation reasons.
That is a political reason...
It convinces others that you're willing to pardon them too in exchange for money and convincing other people is the definition of politics.
It is political, it is also corrupt.
That is not political, it is purely a service offered at a price. There is no specific political agenda behind these pardons (i.e., they don't pardon only folks who are, for example, Evangelicals or anti-immigration or whatever), the only criteria is payment.
> [1] Politics
> Politics (from Ancient Greek πολιτικά (politiká) 'affairs of the cities') is the set of activities that are associated with making decisions in groups, or other forms of power relations among individuals, such as the distribution of status or resources.
I don't know how much more obvious I can make this for you. Bribery is political.
[1]: https://en.wikipedia.org/wiki/Politics
> Bribery is political
Bribery is political. But it's not taken to be a usual part of politics in the West. (Similar to how the Roman word for ambush was the same as their word for treason. Treason isn't taken to be a usual part of politics. Ambush, for them, not a usual part of warfare.)
Basically, you're both right because what is and isn't political is itself a political question.
Word meanings evolve. Virtue literally means "manliness" in Classical Latin but only a pedantic dick would insist we use it in that sense. Polis and it's related words meant something different to the Greeks than they do to us.
Well akchooally the word man up until very recently meant human, so manliness, meant the state of being manly, AKA a human.
Right, "Politics" evolved from "affairs of the cities" to "the set of activities that are associated with making decisions in groups, or other forms of power relations among individuals, such as the distribution of status or resources.".
it’s the current meaning he’s reciting he’s just adding etymological context
Politics (from the first Wikipedia pages)
The word ‘politics’ is derived from the word ‘poly’ meaning ‘many’, and the word ‘ticks’ meaning ‘blood sucking parasites’
First at rec.humor.funny on October 11, 1992
Selling pardons for money is inherently a very political act. It means that you are aligning yourself with moneyed interests, which is clearly the heart of Trumpist politics. Setting ideology aside even, the open selling of pardons sends the message to the moneyed interests in general that he's on their side, even if they don't need a pardon at this exact moment. It serves both practical (get rich people to like you and therefore donate money to your campaigns and causes to help them succeed) and ideological (supply-sider-esque doctrine going back to at least the protestant reformation says that rich people should be in charge because they're rich, QED) purposes.
I am pretty sure that they only pardon people who are pro or at least neutral to Trump. I doubt that he would part on anybody who's an outspoken critic even if they offered him a bribe.
The hypothetical pardon was promised by then candidate-Trump in a speech at the Libertarian Party convention.
The specific political agenda was to get support from libertarians, who lean conservative, but don't like Trump much - because he rejects libertarianism.
That's as political as you can possibly get. It wasn't a behind the scenes thing. It was literally announced at a political convention.
> they don't pardon only folks who are, for example, Evangelicals or anti-immigration or whatever
Who has Trump pardoned that wasn't a supporter of his?
You can go one further. Who has Trump pardoned that wasn’t a complete scumbag?
https://www.newsweek.com/full-list-donald-trump-pardons-seco...
Touche.
I'd like for you to define "political" please.
[flagged]
> seems like it is for large donation reasons.
more like large bribe reasons.
Money is protected political speech.
In the US, this is much the same thing
Is there a difference?
I guess OP means to say it is not idealogical reasons.
Op means to say this type of pardon is not to meant to win votes or satisfy the demands of constituents, Like with convicted cops or people with weed related crimes etc or pardoning draft dodgers after Vietnam or civil war and so on .
While money is involved deeply in politics and financial corruption is there , occasionally idealogical (political) actions without direct financial benefits also happen.
It is hard to say whether this pardon of Silk Road founder was motivated by libertarian, or crypto community pressure or by financial donations to the party etc both are possible even at the same time but they are different considerations
> I guess OP means to say it is not idealogical reasons.
“Government exists for the personal benefit of the leader” (or simply “for my personal benefit as the leader”, with even less generalization beyond that) is an ideology, actually.
It’s not one that is popular to embrace publicly, but, that's hardly unique along real ideologies.
In 2019, Giuliani's assistant chided John Kirakou that pardons couldn't be discussed in his presence but that the fee was $1 million for Giuliani and $1 million for Trump. Given inflation, I'd bet that pardons now cost around $3 million.
> The president’s pardons are not popular with the FBI and law enforcement.
This is an unverified blanket statement. We don't know what percentage of the FBI and law enforcement agree or disagree on anything.
You are talking about Kash Patel's FBI. The guy who has a hit song and book called "The Plot Against The King" pretending the 2020 election has been rigged and who maintains it to this day.
The FBI does what Trump tells them to do, that's it.
And why aren’t the people who “stole” the election being prosecuted by Trumps DOJ and FBI? He had proof remember? I wish liberal media would hammer this point and expose the lie for what it is.
Source? Any of them still employed?
> The president’s pardons are not popular with the FBI and law enforcement
Well if they don't like it I'm sure he would be happy to start a bidding war.
Alternatively FBI agents could also be upset thay they won't have a chance to steal more bitcoin from foolish Libertarians (1).
1: https://arstechnica.com/tech-policy/2016/08/stealing-bitcoin...
They got me—a copyright infringement lord—too. The FBI profile assigned to me even wrote in a case study that the FBI thought I was making millions, amongst other misses.
Their priorities are highly political.
Would love to hear more about this if you're inclined to share or have written about it somewhere. Legal contacts betweent he fedreal government and individuals are often surreal.
The case study: https://ieeexplore.ieee.org/document/10628922/
I read this, and found it to be a disappointing read. It had few details, and instead was more of a social sciences paper, covering basic ideas in academic language.
Roughly it seemed to be suggesting that:
* It's easier to deceive someone if they first solicit for help on a forum
* You can trick someone into revealing sensitive info like which infrastructure provider is used by nerdsniping them: "My mate thinks you should just enable health checking on AWS ELB", and then they reply "Well actually I use Hetzner". Except I'm guessing it was more elaborate than that.
I guess I wasn't the target audience of the article though.
joshmn, what did you think of the article?
Do you find it difficult to trust random commenters online now?
I see you mentioned you can't discuss technical details, but if/whenever that expires (?), that'd be great to hear.
Hi Chocalot,
I also found it underwhelming, though I'd like to think I’m the most scrutinizing of the subject matter. There's some nuance between my take on my behavior and the profiler's, but I'll give them the benefit of the doubt—they only had my Reddit posts to go on and had to package that for investigators.
I still tend to trust by default and make witty comments or jabs that sometimes land flat, so the article was accurate in that sense.
As for talking to the undercover, I made a point of keeping no secrets about my site's technical implementation. Between me and some "competitors," I was usually the first to respond to upstream provider changes—I'd even share my findings without expecting anything in return. Anyone could’ve asked about my issues, and I would've told them.
Trust is the most valued currency in the piracy world, and I worked hard to earn it with both peers and customers. Acting otherwise would've gone against that—and against my own morals. My being neurodivergent may also be worth noting in my willingness (or unwillingness from a free-will perspective) to trust others.
Technically speaking, the site worked by reverse-engineering the league's official streaming services—a few curl requests, careful observation of responses, and adapting them to my needs. There's more to it, of course, but my 2016 MVP was barely 50 lines of Ruby and a plain HTML file. TorrentFreak got some of the details right.
Appreciate it. Will read both of these this evening.
Feel free to reach out with thoughts. :)
https://prison.josh.mn
> It's not even in their effective jurisdiction
Like that ever stopped USA before...
It’s currently not stopping the US federal government domestically.
Shouldn't that be an Interpol or UN responsibility? Why is the US tracking foreign children?
The US gov doesn’t even care about copyright infringement, just in the cases where big companies are inconvenienced by it and it’s done by an individual / small company instead of a mega AI corp swallowing up all copyrighted content to vomit out their own spin on it through algorithms.
Federal investigations tend to only go after big fish yes.
The root problem is the IP laws that congress passed. There will always be large pressure on law enforcement from the industry if you give them that leash.
there's no lordship because afaik there's no direct profit
> Don't they have more important Russian crimes to investigate?
not if their top priority is to erase memory
Trump pardoned Ross largely to buy the (big L) Libertarian vote. It was announced at his speech with the Libertarian Convention.
Not for any ideological reason.
It's not the libertarian vote that he cared about in particular so much as it was the firehose of crypto money that was supporting free ross.
Both are true. For a long time the Libertarian party was seen as drawing away small numbers of protest votes from the GOP, being populated by (mostly) guys who rejected Democratic over-regulation and nanny-statism but also rejected the GOP's anti-abortion politics, criminalization of drugs etc.
A few years ago there was an organized effort to capture key roles in the Libertarian party and focus the organization more on property rights and capitalism, with less emphasis on personal freedoms and constitutional limitations on government. This effectively split the Libertarian party, neutering it as electoral factor.
Now, the Libertarian party never mustered a large share of the vote, but many electoral contests are won at the margins. They managed to get ~3% of the vote in 2016, but lost >80% of that over the following 2 elections.
https://www.the-pechko-perspective.com/political-commentary/...
https://www.splcenter.org/resources/hatewatch/libertarian-pa...
https://en.wikipedia.org/wiki/List_of_United_States_Libertar...
This seems more likely - how many libertarians are there in the US? Surely there are much larger groups you can appeal to if votes is what you're after
There are dozens of us. And none of us can agree on anything.
Libertarians are like independents except noone wants to try to win us over
> Libertarians are like independents except noone wants to try to win us over
Because they are not like independents. Democrats have moved so far left that it's not even a question of who libertarians will vote for. The candidate just needs to show them a little attention so they remember to register and vote.
The Libertarian party got ~4.5 million votes in 2016. Getting some of those votes, or dissuading them from voting, is enough to make a difference in a tight race. See my other answer upthread for more context.
I don't think the crypto crowd was ever at hazard at not voting for Trump, so I'm not sure what the advantage would have been with respect to them. However, the libertarian crowd was.
As a libertarian voter, the pardon for Ross was the only thing Trump did that actually brought me pause. To the point, I felt immensely guilty for not voting for him when I voted (L) because I knew[thought] I was damning Ross to a jail cell. It weighed on my conscious for a long time after the vote, an it wasn't until Trump won I felt somewhat absolved of the guilt.
My personal opinion regarding the Ross pardon is that the Libertarian Party sold its soul for a donut. They could have gotten way more out of Trump than pardoning one particular Internet drug dealer.
Internet drug dealer doesn't really concern me, but he tried to have someone killed!
Look, when Jeffrey Bezos and Larry the Lawnmower ask Trump for an FBI investigation and send him over another solid gold turd or whatever bribe is fitting for such a request, they expect results.
> Silk Road drug lord
Oh please. Ross was no saint by any stretch and it does look like he may have made a very dark decision at one point, but it didn't happen in a vacuum. There's a mountain of details and nuance around that case, including a whole host of law enforcement abuses that many people would find distasteful if not sickening if they actually got the whole story.
>it does look like he may have made a very dark decision at one point
very vague. care to expound?
Entrapment. The FBI posed as a user and convinced Ross that some people needed to be taken out, offered to do it, arrested him, dropped the murder-for-hire charges because they didn't want to play that game in court, knowing it would backfire, while still using those (unconvicted) charges to publicly smear him and influence the judge's decisions, and finally stole his Bitcoin.
Two agents went to prison over this. Those same agents have a history of fraud and abuse.
So -- I can't imagine how one could expect to run a massive drugs-and-arms bazaar and not go to jail forever for something or other. But. I think the surrounding circumstances gp's alluding to might have involved [0] and [1] (with a fairly colorful slant). I'm inclined to give a little weight to the colorful account since the agents in question actually went to jail for the massive theft; for a more neutral treatment, Justice [2] and Vice [3] cover the situation.
The basic claim being that the salacious murder-for-hire bit was 1) never tried or proven, and 2) was allegedly instigated in part by federal agents (operating out of unrelated offices) and a "mentor" of Ulbricht's. In reaction to one of the federal agents himself stealing $800,000 from the criminal enterprise for himself in the course of his investigation. Or something like that.
I'm not clear how that squares with Ulbricht going on to order five more imaginary executions, but the whole thing seems awfully sordid from every angle.
[0] (Sarah Jeong for Forbes' contributor network thing) https://www.forbes.com/sites/sarahjeong/2015/04/17/could-the...
[1] (Andy Greenberg for Wired Magazine) https://archive.is/BvuQr
[2] https://www.justice.gov/archives/opa/pr/former-federal-agent... and https://www.justice.gov/archives/opa/pr/former-dea-agent-sen...
[3] https://www.vice.com/en/article/why-evidence-of-government-c...
We're pardoning fraudsters left and right who bribe the president.
But archive.is ... that people use to read and be informed about the world around them, better get that guy.
The US gov isn't a monolith. We don't know where the pressure came from or when the investigation started.
Follow the money
They didn't pay the bribe or tickle the diapered royal pink starfish properly. The Just Us system operates on the principle of favoritism with selective privilege/retribution rather than consistent fairness. They're perfectly fine having the DNI being a Russian mole and 47 rolling out the red carpet for a sanctioned war criminal.
In this day and age, MAANG, lacking integrity and values, bet on flattery and bribery as business expenses to ensure favorable treats instead of being punished.
Exactly. Now that the facade of being a country with integrity and equality is thoroughly shattered, good luck getting public support for shutting down a website that lets people read news for free. Shit's on fire yo; we got bigger problems than that.
IMO this is more an indictment of the president being able to pardon (any president, not just current one).
IMO the president should, at best, be an additional appeals round. (But probably just not involved in the Judicial because separation of powers is good)
It is pretty sad that this is happening and that it apparently is at risk of just disappearing soon. I understand there are a lot of ethical concerns with that site, but if I use like the Internet Archive's Wayback Machine to try to save some specific documentation pages for certain proprietary software, it absolutely fails to actually save the content. So then it is just a bit more difficult to save a particular knowledge base article before it might get rewritten or updated.
Dumb question: why do news websites have such a hard time keeping users logged in? Like I can go an entire year without getting logged out of gmail. But can't go more than a few days before getting logged out of news websites.
I have subscribed to news sites and still use something like archive.is because it is faster than my paid experience.
Someone should make a site like archive.is that runs the saved page through an LLM to summarize the main points, and perhaps extract a few critical quotes (unfortunately, at the LLM's discretion, but better than nothing). The law is their greatest enemy.
i understand the initial inclination, but leaving our libraries worth of historical data at the whims of our tech monarchs seems like a bad idea.
we know for absolute fact they’ll remove or alter data to entirely change answers. we _know_ they’ll do this.
No, nobody should. Don't know why any trust the gossip slop bots given the extra work required. Pretty brave to trust another word guessing bot when its unable to stop making stuff up. Might want to ask it about Smith-Mundt 2012. Bill says discern, a lot, for a reason.
...as markdown please.
Related HN discussion from 2023: https://news.ycombinator.com/item?id=37009598
The government can take down huge criminal networks on the darkweb but can't identify the owner of a clearnet site?
Since you refer to the darkweb. The gov has extensivley studied Tor and likely has zero day exploits for the Tor browser and operates a bunch of Tor relays. Given enough time and effort it is very much possible for state actors to identify Tor users.
But unless you are a high profile gov target, Tor protects you well.
> unless you are a high profile gov target, Tor protects you well.
How do you really know that? I understand the theory, but do you have evidence? Have you tested it or read research that has tested it?
I would hesitate to give advice to people when they could get hurt.
> do you have evidence
Well it is of course not possible to 100% prove that Tor protects your privacy.
But the lack of evidence is also evidence. While we have evidence that the gov was able to identify indiviuals using Tor e.g. hosting drug portals, there has been no reports that individuals or companies are able to de-anonymize Tor users.
Tor was created by the US Navy.
And?
It’s intriguing trivia for people of all persuasions.
Government can exploit Firefox, more at 11.
That owner is not so simple - I recall how they alleged in a Wikipedia discussion he(?) used some botnet or proxy network for adding archive.is mirror links to Wiki entries: https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment...
They can and they will. Filing a subpoena for information is a step in that process.
If the WHOIS records are falsified they'll start looking at payment information.
Who said they can't?
How often do we read articles about authorities attempting to unmask suspected criminals? That the FBI is trying to identify this person seems to be noteworthy to the point that there are many people here commenting on the story, and the fact that they are trying and have not yet seems to indicate that it is not a forgone conclusion.
“[website] got a subpoena” stories are routine.
not sure what's taking the FBI so long, to me it seems obvious: https://drive.google.com/file/d/1M6PMQrehmeuRU_KDd_PTKsTtVNN...
I took the time to read that document a while back and it almost certainly isn't the correct guy. At the very least it provides 0 evidence other than concluding that "he must be the guy" due to his name, country of origin and programming background.
Yeah I just read through it and it presents absolutely no useful evidence. They establish that there's a developer in the US called Denis Petrov. They establish that someone involved with archive.today is often referred to as Denis Petrov. Then they make some weird leaps to conclude that they must be the same person.
A quick web search suggests Denis Petrov is not at all a unique name. Just because on of them wrote a somewhat feminist thought on a blog in 2004 and another forked a... let's call it "satirically feminist" project on GitHub does not in any way suggest they are the same person.
Yeah, Russian Wikipedia says "Petrov" is in top 10 of most frequent Russian surnames (3rd in one list, and 10th in another):
https://ru.wikipedia.org/wiki/%D0%9F%D0%B5%D1%82%D1%80%D0%BE...
I’m not certain either way, but part of the document tries to make a big deal about some GitHub profiles having the “arctic code vault archive” badge, and implying that has something to do with running an archive website.
Pretty much anyone who has made any kind of commit to an open source project has that badge.
read the same PDF a year or so back when someone spammed it across the archive.is blog, laughed when i got to that bit - it's pretty clear the person writing it doesn't know anything about development
edit: it's incredibly naive of them to immediately trust the WHOIS results. i can say from experience that these are never checked
Surprised he's American. I hope he finds refuge but there's not many places that either don't have strong IP laws or don't have US extraditions.
Looks like `archive.is` is currently using reCaptcha. So Google might be able to figure out and tell the FBI who runs it. (If not by data around the registration, then by data around accesses to the site that seem to be by a developer of it, coupled with their cross-site tracking data.)
I've also seen Cloudflare similarly in the loop, and they have similar cross-site tracking data.
Lesson: The same third-party tech surveillance companies to which you sell out all your visitors, can also violate you.
Coincidentally, their adoption of Google CAPTCHA (along with requiring javascript) is why I stopped using archive.today. I don't particularly want either of those entities executing mystery code in my browser, or on my computer at all.
Helping Google to collect records of my reading habits is also unappealing.
I hear it isn't actually reCaptcha, just made to look like it.
You can easily check this. It's an iframe of recaptcha.net, loaded in via a gstatic.com javascript file. So it is an actual reCaptcha
While strangely unpopular here, Yasha Levine's[0] well documented premise is that the entire existence of the internet is designed for surveillance and content control, down to the chip level, and this is mandated and enforced through laws as well as more covert agreements.
[0] https://www.amazon.com/Surveillance-Valley-Military-History-...
It's strangely unpopular because it's wrong in the one place techies care about: the details.
In broad strokes, it's true to say that the Internet was created as a surveillance and control tool. But this was not a big design up front with those goals as built-in capabilities. There's nothing in TCP/IP you can point to and say, "yes, this is the surveillance bit", or "yes, this is the government control bit". "Down to the chip level" is just plain wrong. Yes, you could argue that the Internet was enabling those things, but that's true of all communications technology, if not just the basic concept of human socialization[0].
And, in practice, if the US had actually intended for the Internet to be a surveillance and control tool, it was sure as shit really fucking bad at making use of it. The only country that actually realized it needed to censor the Internet to maintain cultural/social hegemony was China, which is why they got into network censorship early. By the time America realized it wanted that level of control it had to outsource the wetwork to creative industry and advertising companies.
[0] Most neurotypical people fail to recognize this.
> it's true to say that the Internet was created as a surveillance and control tool
I'd even argue against that, unless you got some strong evidence supporting such a claim. The mere presence of potentially traceable (IP) addresses is a technical necessity of point-to-point data transfer as contrasted to broadcast.
Techies really like to believe that they are building a bright future for humanity. Telling them that what they build is a high-tech concentration camp won't be received well.
"It is difficult to get a man to understand something, when his salary depends on his not understanding it."
-Upton Sinclair
Never knew archive.is was run by a "masked man"
This does not fit well with the current Cloudflare initiative.
Either the jurisdiction of a nation extend over its physical borders as long as there is a connection in digital space or it does not.
If the former, EU regulations do apply to American companies and they have to comply or leave the market and make sure that their offerings are not available here.
If you're enough of a hypocrite, it makes perfect sense.
I discovered just yesterday that Verizon home internet blocks archive.is. Changing the router DNS from their default to openDNS fixed the problem for me, so it looks like they made only a nominal effort to block it.
That may be a Cloudflare DNS specific issue, there's been a long standing dispute between archive.today and them about some DNS details. https://webapps.stackexchange.com/questions/135222/why-does-...
This issue seemed to resolve itself sometime in the past year. I’m not sure if that’s because Cloudflare decided to surrender some of my PII in exchange for eDNS resolution or if archive.is finally stopped demanding it from them.
If I read that word salad correctly, Cloudflare says they're blocking it because they want to "protect the privacy" of users who do a DNS lookup of archive.today, to prevent the requester's IP address from being reviealed to archive.
That seems ludicrous, given that after a DNS lookup, the next thing anybody does is to send an HTTP request, which obviously reveals that same IP address to the archive servers.
So it's an obvious and blatant lie by Cloudflare, and I wonder what their real reason is.
I notice the iamadamdev paywall bypasser extension was also taken down with a DMCA request.
Mirror https://github.com/nikolqyy/bypass-paywalls-chrome
Let's build and share more and better tools to help ensure poor kids are allowed to learn.
Information, knowledge, and education do not belong only to those with money.
Development is continuing here, the link you gave is just an old mirror:
https://gitflic.ru/project/magnolia1234/bypass-paywalls-fire...
Thanks, wikipedia has more info and good links to related stuff:
https://en.wikipedia.org/wiki/Bypass_Paywalls_Clean
see also esp:
https://en.wikipedia.org/wiki/12ft#Alternatives
Is there an easier way to get around all the complicated cookie selection? I don't care if they have 183 trackers. Do I need all those? Are the important to me? I suppose they are important to them. Isn't there just a 'no to all' or at least a 'just the bare minimum for state management'?
I use https://addons.mozilla.org/en-US/firefox/addon/consent-o-mat..., though it didn't work so well for this site.
Someone at archive needs to prepare a torrent file right away
I will be devastated if this site gets taken down. I subscribe to pinboard.in for personal website bookmarking but even that is not 100% guaranteed to successfully cache a copy of the page.
So sad this wasn't shared via https://archive.ph/L2u8Z
they're fighting the wrong enemy. News content is such low quality, archive.is is the only enjoyable way to consume them. Their articles aren't worth wading through relentless popups.
IF I'm curious about a fact or story, it's chatgpt. if someone sends me a link , it's archive.is . When archive.is goes way, I'm never going to see a CNN, NYT, LAtimes/etc logo again.
News aggregator The Drudge Report recently started using archive.is links to articles. That might have angered some publishers.
I thought the government was shut down? Why is this funded but not SNAP?
ironic that archive.is itself has so many bot protections...!
"Make data accessible and preserve it" -> FBI "Use copyrighted data for LLM training and sell the product" -> Billions from NVIDIA
I doubt this has anything to do with copyright law. I'm certain it has everything to do with certain things needing memoryholing and archive.* operators' lack of compliance.
And this very news site's settings are "Data processing by advertising providers including personalised advertising with profiling - Consent required for free use," funnily enough
>According to this, Archive.today uses a botnet with changing IP addresses to circumvent anti-scraping measures.
Archive.today uses Tor exit nodes when all of its main server IPs are blocked, so I believe this to be a disingenuous claim.
Without the popups, care of the target: https://archive.ph/FEcEi
this is a waste tax payer funds.
This might not be about copyright. I generally avoid these mirror sites because they seem like the perfect opportunity for watering hole attacks. The challenge with a normal watering hole attack is that you have to control the site in question either by hacking or infiltrating it. Imagine however if you were able to act as a middle man to the most popular websites in the world, and people would voluntarily post links to your site all over the internet, including very valuable audiences (like HN). You would have free rein to selectively inject malware to just readers at targeted IP blocks, minimizing chances of detection because most users would never be served malware. The possibilities are endless, government espionage, corporate espionage, activists, political opponents.
To be clear I have no reason to believe specific instances of these sites are malicious, but I would be shocked if black hats weren't trying to get into this space in general.
For sure, you shouldn't just trust whatever random mirroring site pops up (in fact, you probably should trust almost none of them), but archive.is has established themselves pretty credibly IMHO. At some point it could turn, but I don't think we should kill them now just in case they turn at some point.
The fact that the FBI is involved, and given the insane amount of IP protection racket stuff going on, I think it's pretty highly likely this is all about copyright. I think the powerful interests care more about copyright than they do about most other things.
Maybe, but the subpoena doesn't shed light on what they are being investigated for. It is only demanding information.
The FBI could be investigating them for archive.today, they could be investigated because of that apparent botnet, they could be investigating them because some billionaire media mogul friend of the current POTUS is outraged at the loss of revenue. To the best of my knowledge, the reasons aren't public.
Still, it doesn't mean we shouldn't be asking questions or expressing concern over this.
The FBI is targeting Archive.is, a botnet-powered archive that has operated openly for years. What a fascinating mix of tech and legal exposure.
They are probably just using proxies to scrape from and not directly or knowingly using proxies supplied by botnets.
Knowingly or not, they're still an accomplice. They're basically the getaway driver for cached webpages. /s
Wait why should we have a problem with this?
We don’t have a problem. That was sarcasm.
Sorry im stupid
> Another private investigation from 2024 comes to a different conclusion. It names a software developer from New York as the alleged operator. According to this investigation, following the trail to Eastern Europe proved to be a red herring.
Any pointers to what this "private investigation" is? The other linked blog pointing to Russia (or at least a Russian) seems pretty convincing:
https://gyrovague.com/2023/08/05/archive-today-on-the-trail-...
I think it refers to this: https://drive.google.com/file/d/1M6PMQrehmeuRU_KDd_PTKsTtVNN...
It doesn't seem very convincing in its conclusions, but has some interesting information nonetheless. I searched for some info on this doc, and it seems that its author really did hire some private investigators, I even found gofundme for it and places where the author asked for help in the early stages of their investigation. It seems they were trying to find the website owner because he hadn’t responded to requests to delete some personal things archived by a prolific stalker.
FBI could just create Great America Wall and block it, what's the problem?
Archive everything you can in Chinese, while you still can, if you think the folks over there may have posted anything worth you reading…
https://archive.ph/FEcEi
What is that FBI wants to hide but not making it public and why?
> There are also indications that the operator(s) are based in Russia.
That's long been my assumption.
What I haven't known was whether this was good Russian people (culturally valuing literature and intellect) wanting to be able to access articles that they can't afford.
Nor whether it was or could become something sketchier (e.g., feeding spy databases, or one nice Chrome zero-day and strategic timing away from compromising engineering workstations at most US tech companies where an employee reads HN).
But what actually bothers me about the misc `archive.*` sites is how HN routinely uses them, for US tech company workers to circumvent paywalls for struggling journalism organizations. This piracy practice seems to have the unofficial blessing of the US tech investor firm that runs and moderates HN. Besides whatever laws this is breaking, subjectively, it feels to me like crossing an ethical line, and also (economically) like punching down.
I only very rarely read articles from these sites, and on these occasions I tend to get a lot of value out of it. I’d be happy to pay a dollar (or a couple bucks) per article. But I tend to forget about my subscriptions, and don’t read enough articles to justify it, so I’ll be wasting a lot of money buying it.
The problem with the paywalls is that everyone offers a subscription. If you want to read a single article you do not want to subscribe to some US newspaper.
x402 solves this.
X402 doesn’t solve this, because the publishers don’t want to sell you just a single article.
Just because they don't 'want' to, doesn't mean it's not a good solution. Clearly they're not getting me to pay for their whole site subscription so why not just sell me the article? The biggest issue with most of these services is the lack of consumer 'ease' by which the creator can actually get paid. It's the same reason why I'm seeing all my friends go back to piracy, it was nice when we had a consumer convenient place to consume our content. Just look at Steam, it's easier than ever to go pirate the majority of games, but my Steam library just keeps getting bigger. I'm not against buying things, I'm against shitty services.
I can't argue with your ethical concerns.
The internet is fundamentally different than print though—perhaps this fundamental change to journalism requires another way to pay the bills. (Advertising is the obvious one.)
Or maybe we, as a society (because of our internet ways) simply don't deserve these services any longer.
Perhaps the internet itself is the problem. What if instead that was the big mistake after all?
Called passion. Pays in spades. Something bots know not of. Includes carbon bots.
It seems there is not enough "passion" to pay the bills, sadly.
Btw their leadership names are listed on their website
The FBI is attempting to unmask the owner behind archive.today, a popular archiving site that is also regularly used to bypass paywalls on the internet and to avoid sending traffic to the original publishers of web content, according to a subpoena posted by the website. The FBI subpoena says it is part of a criminal investigation, though it does not provide any details about what alleged crime is being investigated. Archive.today is also popularly known by several of its mirrors, including archive.is and archive.ph.
The subpoena, which was posted on X by archive.today on October 30, was sent by the FBI to Tucows, a popular Canadian domain registrar. It demands that Tucows give the FBI the “customer or subscriber name, address of service, and billing address” and other information about the “customer behind archive.today.”
“THE INFORMATION SOUGHT THROUGH THIS SUBPOENA RELATES TO A FEDERAL CRIMINAL INVESTIGATION BEING CONDUCTED BY THE FBI,” the subpoena says. “YOUR COMPANY IS REQUIRED TO FURNISH THIS INFORMATION. YOU ARE REQUESTED NOT TO DISCLOSE THE EXISTENCE OF THIS SUBPOENA INDEFINITELY AS ANY SUCH DISCLOSURE COULD INTERFERE WITH AN ONGOING INVESTIGATION AND ENFORCEMENT OF THE LAW.”
The subpoena also requests “Local and long distance telephone connection records (examples include: incoming and outgoing calls, push-to-talk, and SMS/MMS connection records); Means and source of payment (including any credit card or bank account number); Records of session times and duration for Internet connectivity; Telephone or Instrument number (including IMEI, IMSI, UFMI, and ESN) and/or other customer/subscriber number(s) used to identify customer/subscriber, including any temporarily assigned network address (including Internet Protocol addresses); Types of service used (e.g. push-to-talk, text, three-way calling, email services, cloud computing, gaming services, etc.)”
-snip-
Read more: https://www.404media.co/fbi-tries-to-unmask-owner-of-infamou...
> YOU ARE REQUESTED NOT TO DISCLOSE THE EXISTENCE OF THIS SUBPOENA INDEFINITELY AS ANY SUCH DISCLOSURE COULD INTERFERE WITH AN ONGOING INVESTIGATION AND ENFORCEMENT OF THE LAW.
Is this actually a mere request, as in the receiver is _not_ required to avoid disclosure?
Separately—can't believe tucows is still around!
Isn't the whole thing a request? The FBI has no power in Canada unless they go through Canadian legal channels, no? If I received a subpoena from a foreign sovereign I would just use it as toilet paper.
Pretty sure if Tucows had been in the US, the "request" would have been a gag order or something alike.
Until you travel to the US 10 years later for someone's wedding or what not completely forgetting about the matter and you end up arrested
They probably cannot require this. They may be able to get you on interfering with their investigation if you disclosed with the intent of interfering. Probably adding this notice helps them prove you were aware of the potential to cause interference, at least. IANAL.
Well Tucows is Canadian so the FBI can take their “request” somewhere else.
[dead]
As a Canadian, I really hope Tucows is going to send a particularly nasty response to the FBI. Canada should never collaborate with any US authorities!
> Canada should never collaborate with any US authorities!
Cross-border collaboration is a good thing. Our agencies regularly collaborate to bring people who feel insulted and emboldened to account for their crimes. This works both ways.
As someone who has dealt with media of me as a minor (~around 11/yo) from Omegle being shared across the internet, the role archivers play in keeping illegal content “alive” isn’t well recognized. Thankfully, the Internet Archive has a matured process to purge pages that host illegal content.
We do not know what the investigation is for. All is up to speculation. Not all investigations are bad.
Here is an example on archive.is. I submitted multiple complaints to NCMEC but didn’t get results. Germany, though, was able to get the archives purged.
https://archive.is/https://ezgif.com/maker/*
On the page, you will see:
> In response to a request we received from 'jugendschutz.net' the page is not currently available.
That page held many, many images of minors. It is good that it is gone.
August 12, 2025 - Canadian Man Sentenced to 188 Months for Attempted Online Enticement of a Minor and Possessing Child Pornography [1]
August 21, 2024 - Canadian National Extradited To The United States Pleads Guilty To Production Of Child Sex Abuse Material And Enticement Of Minors
December 20, 2024 - Extradited Canadian National Sentenced To Life In Federal Prison for producing child sexual abuse material and enticement of a minor [3]
[1] https://www.justice.gov/usao-ndny/pr/canadian-man-sentenced-...
[2] https://www.justice.gov/usao-mdfl/pr/canadian-national-extra...
[3] https://www.justice.gov/usao-mdfl/pr/extradited-canadian-nat...
> Cross-border collaboration is a good thing
IMO it's only a good thing when it's a good thing. There are plenty of reasons it could be a bad thing too. For example, Edward Snowden probably would have been hung by now if russia cross-border collaborated.
The term that Cory Doctorow has called these styles of stunts is:
Felony contempt of business model.
Turns out, our very user, Saurik, came up with this term!
https://pluralistic.net/2022/10/23/how-to-fix-cars-by-breaki...
Wait is archive.is a bad website to go to?
Using FBI activity as a proxy for "good" or "bad" associations is folly.
One can only imagine the sharing and reading histories the operator has accumulated on the people using it. No restrictions on how the operator can use that data
Archive.today is very popular with HN commenters
Definition of "histories"
Records of every URL submitted and accessed by a given IP address+browser fingerprint^1
1. Archive.today sites use a form of "pixel tracking" to collect IP addresses via popular graphical browsers, the ones that are required to be used to solve CAPTCHAs, that automatically request URLs in HTML tags with the "src" attribute, e.g., "img" tags
2. Archive.today sites serve CAPTCHAs to some users^3 which force them to enable Javascript and share browser information
3. For example, those users not using or appearing to use popular graphical browsers
What histories? Does archive.is take your email, phone, credit card, and passport pic when you want to read anything? The most there is is just an IP address in the server logs, for most users, rotated by their ISP on regular basis, easily obscured with a VPN.
This need to make IP-infringement sound ominous by invoking some ill-defined spy plot is a tired cliche.
There are many mechanisms, widely used, to aggregate information from many sources into a profile of you, and using your IP as an identifier isn't hard. Many lawsuits find their targets based on IP addresses, for example.
> easily obscured with a VPN
I think we can expect that commercial VPNs are compromised, at least by intelligence services. Imagine you opened a bar and advertised, 'dissidents come here to drink in privacy'. I'm sure you'd attract others too to an obviously target-rich environment.
When the proxy operator is (1) anonymous, (2) the potential target of coercion, and (3) outside the user's jurisdiction, how does a proxy user verify (a) whether the operator collects, or is being forced to collect, data or (b) what it might do, or be forced to do, with colllected data
Perhaps the answer is they don't verify
They don't collect any personal information - one of the reasons why it is so popular.
How do you know? One’s IP address counts as PII.
How do you know?
I had no idea archive.is was illegal… If you put massive holes in your paywall you get what you deserve IMO.
unequivocally bad move
I'm concerned that, even here on HN, we are underestimating both the magnitude of this looming conflict and also its inevitable conclusion.
The internet is bigger than you and me, and it's bigger than computers. It is an evolutionary force. It is not going to be stopped, and certainly not by states whose popularity and authority are waning so rapidly.
Moreover, the internet keeps as its core function the proclivity to copy and store bytes, and from this very simple mechanism emerges a large set of tools and norms that supplant nearly all of the ways that nation states perpetuate power.
What we desperately need are the elder statesmen and women to stand up and soberly see the writing on the wall, and gracefully deprecate the systems of which they will soon lose control, starting with nuclear weapons.
I don't believe this has to end in violence or acrimony of any kind. But we have run out of time to act like petulant children, crying that somebody took our empire away.
Very soon (ie, in the next couple centuries, maybe sooner), a few small nation states will adopt frameworks of zero IP, allowing all the content of the internet to be housed there, and from there, accessed by the entire planet.
Some other nation states may attempt some kind of embargo or sanctions, but these will obviously fail, just as the attempts by Russia and China to censor the internet within their borders have failed (and are failing with greater volume with each passing day). And before you cry that adoption in China is too low to support this conclusion, consider that the work to resist the GFW has begat some of the best networking tools in the world, with rapidity of evolution increasing, not decreasing. Even if fear and violence can stem adoption for a few decades or even centuries, the toolchain continues to grow and will tip the scales over sufficiently long time scales.
Let's not let this become a world information war. Let's install peace now while it's still easy to do. Let's dispense with IP and live in a world of joyous open access to information.
While we're here how does archive.today bypass paywalls?
>What scraper or headless browser are you using? it works so well.
>Before 2019 - PhantomJS, after - ordinary (not headless) Chromium/80 with few small patches.
https://blog.archive.today/post/618635148292964352/what-scra... (2020)
>Archive.today launches real browsers (not even headless) and tries to load lazy images, unroll folded content, login into accounts if prompted with login form, remove “subscribe our maillist” modals
https://blog.archive.today/post/642952252228812800/people-of...
I get that it convincingly simulates a human but so do I (because I am a human) and I don't get through the paywall...
There are some tricks which work for different websites - for example, for NYT it's enough to manually clear nytimes.com cookies, FT used to work after click from twitter/x and so on. So I guess there is some set of heuristics.
It seems that archive.is often has the full article for sites that are completely paywalled to every non-paying visitor: no cookie-driven freebies, nothing.
Publicly revealing everything they are doing would be a strategically bad idea, obviously.
It's not inconceivable that they actually pay for access to some of the sites; it wouldn't be surprising.
They are not actually bypassing firewalls - therefore I think they are on ethically good grounds. Those sites show their full text for web crawlers - only not to humans. Basically, archive.is and the folk simulate that through various means. Headless browsers, better agent injection etc.
I don't think that's true. If it was that simple, there would be browser plugins or other apps that would replicate that behavior. Do you know of any?
I always assumed so - because Google can index them full text. It used to be the case that you could see those full snapshots in Google as cache - this was before the sites strong armed Google to remove those snapshots from being accessible, then archive.* folk rose to power. You can test this yourself for searching for a unique quote on those sites and still getting hits in Google. But you are right - why could this not be achieved with a plugin then - don't know.
Perhaps not in the same way as described above, but BPC exists.
https://en.wikipedia.org/wiki/Bypass_Paywalls_Clean
I thought it just sees a full version for crawlers?
Nope, see r721's comment above yours for how it purportedly works.
Can they enforce DNS companies (ISP, cloudflare etc) to block these domains globally if they want to?
Cloudflare's DNS actually hasn't worked with archive.today for >5 years, due to the site returning bad results in response to Cloudflare not sending EDNS subnet info. HN comment from someone at Cloudflare: https://news.ycombinator.com/item?id=19828702
> Archive.is’s authoritative DNS servers return bad results to 1.1.1.1 when we query them. I’ve proposed we just fix it on our end but our team, quite rightly, said that too would violate the integrity of DNS and the privacy and security promises we made to our users when we launched the service.
> The archive.is owner has explained that he returns bad results to us because we don’t pass along the EDNS subnet information. This information leaks information about a requester’s IP and, in turn, sacrifices the privacy of users. This is especially problematic as we work to encrypt more DNS traffic since the request from Resolver to Authoritative DNS is typically unencrypted. We’re aware of real world examples where nationstate actors have monitored EDNS subnet information to track individuals, which was part of the motivation for the privacy and security policies of 1.1.1.1.
This was fixed/changed at some point. I use Cloudflare's DNS and it works fine for me.
This is the way to defend the free spirit of the internet.
"Infamous"? About as infamous as heise.de. Weird framing. Many people do not like the past being available for reference when they lie about in the future. And that's what this federal attack stems from.
"who controls the past controls the future: who controls the present controls the past"
I just posted an exerpt of the 404 article but decided to link the source.
My thoughts exactly. This title is needlessly editorializing.
Why on earth did you get downvoted?
[dead]
Nothing infamous about it. It's the only way to stay informed from diverse sources since proliferation of paywalls started
And if no one pays for any of that content there will be zero ways to stay informed!
Information has existed long before the invention of money, and will outlive it.
IMO it is a false dichotomy to equate piracy with theft like that.
I think 90-99% of anything pirated (or accessed by bypassing paywalls etc) would just never be bought/paid-for if there was no alternative.
Are there no ads on the site?
Isn't paying to remove paywalls another way?
If you are OK with getting your information from one or two sources - why not. You can also subscribe to a newspaper. But surely internet can do (and did until relatively recently!) better than that.
Paywall epidemic is a recent phenomenon, internet media managed to exist before that.
We can't lose that site. Hacker news can't exist without it in this day and age of paywalls.
Oh no, how will so many people on HN fish for karma if they can’t contribute to conversations with an archive.whatever url that takes 2 seconds to generate on your own?
Not if it isn't available in my country, I can't. Personally, I'm grateful if somebody on here provides an archive link to an article I otherwise cannot read without additional technical intervention. To label it karma fishing is really uncharitable.
Uh I hope for the best, some websites opted-out of archive.org, so archive.is is my alternative.
Imagine tech journalists in 2025 not knowing what a canary is...
The FBI is conspiring against the owner of archive.is
I feel bad for the owner. He must be telling his friends "The FBI is out to get me" and they must think he's insane and they try to get him institutionalized...
The psychiatrist will note "Patient has delusions of grandeur; he thinks he is the owner of Wayback Machine and that the CIA is after him. Diagnosis: Paranoid Schizophrenia"
That is very funny. "AI" corporations are funding a scraper to subvert paywalls:
https://news.ycombinator.com/item?id=45835090
The FBI should investigate the "AI" companies and also the demise of Suchir Balaji, a copyright whistleblower who according to a sloppy local police investigation committed "suicide" hours after being seen cheerfully collecting a doordash delivery on CCTV.
AI provides shareholder value, while archive.is reduces shareholder value, and that's all that matters
I don't think it's even that abstract anymore. The AI companies donated to help Trump build a ballroom. archive.is didn't.
I'm so confused by this power dynamics. So you can torrent movies just fine from Meta/Google office in Germany? No matter how you look at it, the White House has holes in the walls as in Idiocracy.
Copyright lobbyists and sport broadcasters, the ultimate overlords of the web.
[dead]
[dead]
[flagged]
I thought this was common knowledge? Did they try googling it?
The thought to google it doesn't really have a chance to enter their head if they don't know about it.
Maybe they ran into a paywall
A certain country is trying to scrub the internet of evidence for its war crimes.
I hope someone once does a deep dive when archive.org was taken down for a few weeks by hackers from a "pro-Palestinian" group. It felt like a black propaganda attack, especially with the very tame videos they shared on social media about the crimes their enemy committed (videos of buildings being blown up instead of innocent children).
It could be for something far more petty, like covering up speaking gaffes. When you give petty image-obsessed people a lot of power, they’ll use it for petty, image-preserving reasons.