1
Software Engineer, Archiving & Data Services (Remote)
Interested in a mission-driven job ensuring open access to
information for a global audience? Enjoy building technologies
and products critical to thousands of libraries, non-profits,
cultural heritage, and educational organizations worldwide?
Internet Archive is seeking a Software Engineer for its Archiving
& Data Services group. Internet Archive is a non-profit digital
library, top 200 website at archive.org, and an archive of over
100 petabytes of digital information running in many self-owned
and operated data centers. Internet Archive also provides
mission-aligned services to thousands of organizations worldwide,
working collaboratively to advance our goal of “Universal Access
to All Knowledge.”
We are seeking a Software Engineer to work in our 12 person
engineering team and help grow the Archiving & Data Services
group’s suite of digital archiving, data, and access services for
a global set of partner research, memory, and social good
organizations. All engineering roles will work closely with
product, program, and operations staff to engage in the full
software development lifecycle. Preliminary duties of the role
will primarily focus on developing Archive-It, our web archiving
service currently used by over 1000 partner organizations to
archive and provide public access to web collections totalling
hundreds of terabytes of data and billions of files each year.
The role may also assist with specific development projects for
other department services and systems. Reporting to the
department Head of Engineering, the Software Engineer will help
build and improve tools, technologies, and systems that support
values-aligned products, at petabyte scale, for a global
coalition of institutions providing open access to digital
information.
Key Responsibilities:
* Developing and maintaining department web, data, and archival
systems
* Writing and delivering high-quality software along with
automated tests
* Leading the evolution of the stateful systems underpinning our
services.
* Assisting with production operation issues
* Collaborating with diverse stakeholders to translate
requirements and features into technical designs and software
solutions
* Developing deep expertise in a range of the technologies in our
stack
* Fostering a culture of collaboration, learning, and growth
* The role requires some travel to North America based events and
meetings, including team and organization in-person meetings as
well as possible conferences, symposia, and partner-convening
events conducted as part of our programs.
Qualifications and Skills:
* At least 3 years of experience as a professional software
engineer
* Strong measurement and analysis skills. The ideal candidate is
experienced in adding the right instrumentation to diagnose
production issues, and can empirically qualify new infrastructure
to bring into our stack.
* Experience using Python to build web and data services is
preferred
* Expertise, or an interest to become an expert, in one or more
of the following stateful systems:
* Postgres
* Elasticsearch
* Cassandra, Scylla, or other distributed KV databases
* Temporal (temporal.io)
* Experience or proficiency in any of the following are a plus:
* Web crawling or Django experience
* Broad knowledge of the technologies and protocols underlying
the web are valued over specific framework experience
* Experience building solutions without managed services. We own
and operate our own data centers running Linux virtual machines
and self-administer the full stack.
* Linux system administration skills
* Comfort working in a loosely structured environment requiring
individual autonomy and initiative within one's scope of
responsibilities
Job Details
This is a full-time, permanent, remote-first position working in
a distributed team. Candidates will need to have significant time
overlap with North America (and largely Pacific Time) based
colleagues. Compensation will be commensurate with experience and
the role is open to candidates of varying seniority with a
general salary range of $110,000 to $125,000 that will take into
account experience and location. Applicants must be eligible to
work in either the United States or Canada. References must be
made available upon request.
Benefits & Perks
The Internet Archive is a remote-first workplace and provides a
comprehensive benefits package including PTO, paid holidays, and
medical benefits. Depending on where you live, we also provide
these additional benefits; dental, vision, health savings
accounts, flex spending accounts, commuter benefits, short term
disability, long term disability and retirement programs. At the
Internet Archive, we believe we do our best work when our
employees bring together diverse ideas. Members of all groups
under represented in the tech industry and library world are
strongly encouraged to apply. We are proud to be an equal
opportunity workplace and are committed to equal employment
opportunity regardless of race, color, religion, national origin,
age, sex, marital status, ancestry, physical or mental
disability, genetic information, veteran status, gender identity
or expression, sexual orientation, or any other characteristic
protected by applicable federal, state or local law. Internet
Archive is an Equal Opportunity Employer. Internet Archive
complies with the Fair Chance Ordinance. Internet Archive is a
501(c)(3) non-profit library founded in 1996.
You know that license does nothing?
You know that comment does nothing?
Anti Commercial-AI license
can you explain the background to understand this :))
please ;)
Putting a license on a comment isn’t how licensing works.
One party can’t unilaterally decide we are in a contract (what a license is).
If they moved their comment behind a wall and required clicking “I agree” then it would be a valid agreement.
It’s the same thing as people posting “don’t use my data” in their Facebook wall. It’s not how legal agreements work.
Any AI crawler will just suck up their data regardless of them putting a license in their comments.
If they truly want to stop AI from using their comment, they should advocate for a more robust robots.txt on their instances server.
Only a couple of bots respect that file anyway