Building Airbyte Connector for DOJ's Press Release API for Prison Reform Data Analytics Project

Summary

The user is looking for assistance in building an Airbyte connector for the DOJ’s press release API to gather data for a prison reform data analytics project. The project involves examining compliance with the First Step Act by the Bureau of Prisons and building an inventory of federal inmates.


Question

greetings airbyte! :slightly_smiling_face: anyone want to help me on a prison reform data analytics project? Specifically, assist me in building an airbyte connector for the <https://www.justice.gov/developer/api-documentation/api_v1|DOJ’s press release API>?

i am contributing to an american nonprofit initiative tied to measuring the effectiveness and compliance of new prison reform laws (specifically <https://www.bop.gov/inmates/fsa/|the first step act>). we are working to examine <The First Step Act: Ending Mass Incarceration in Federal Prisons – The Sentencing Project well the bureau of prisons is complying with fsa guidelines> - and the first step of this work is to build as comprehensive an inventory of current federal inmates as possible.

is there anyone who would like to join me as I put together an airbyte connector for the <https://www.justice.gov/developer/api-documentation/api_v1|DOJ’s press release API>? a typical request to this api looks like this: https://www.justice.gov/api/v1/press_releases.json?pagesize=50&amp;page=0&amp;sort=changed&amp;direction=DESC

i’ve got the nlp chops to extract party names, ages, locations, case events (indictments/pleas/case dispositions/sentencing), charges, etc…Believe it or not, there is no openly available API to retrieve this type of case disposition data.

from this data I plan to then:

  1. <Inmate Locator inmate lookups> using each prisoner’s name/age to find their BOP inmateNum
  2. <Inmate Locator data on known inmates><Inmate Locator (using the BOP ID)> on a cadence to examine whether the inmate’s projRelDate is reflective of our forecast of FSA time credits they would have earned so far (based upon <Federal Register :: FSA Time Credits guidelines>, <Federal Register :: Good Conduct Time Credit Under the First Step Act published rules on the subject>, type of offense, a handful of assumptions, etc…)
  3. poll the doj press release site to discover new inmates as they move through the federal court system. i do not believe there is a method to request records published after a certain date. the best i can think right now to support incremental sync is to <https://www.justice.gov/api/v1/press_releases.json?pagesize=50&amp;page=0&amp;fields=uuid,date,created,changed&amp;parameters=[date]=1231243200|return results on or after a specific date> (it would be ideal if we could use the changed field to refresh our records vs date, but those timestamps are down to the actual second and I was unable to get wildcards to work)
    i’m sure there are easier ways of bringing this data into my pipeline - but i wanted to take the opportunity to learn more about airbyte (especially because of its vast connector ecosystem, i think it will be useful for other planned components of the project).

Payment:

• I could probably get you a small stipend funded out of my own wallet
• barter nlp/ml work
• become a larger contributor to the project (it’s called wise first step, nothing has been published as of yet…press release will likely be coming from <https://www.law.gwu.edu/center-law-and-technology|gw law> + <https://www.law.northwestern.edu/legalclinic/wrongfulconvictions/|nw law> by march 1-15 :shushing_face:)



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["airbyte-connector", "doj-press-release-api", "prison-reform", "data-analytics", "first-step-act", "bureau-of-prisons", "federal-inmates", "nlp", "incremental-sync"]

Hello! It’s great to hear about your project and your interest in using Airbyte for data integration. While I can’t directly assist you in building the connector, I can certainly guide you on how to do it based on the information available in the extracts. Airbyte has a Python Connector Development Kit (CDK) that you can use to build a custom source connector. You can follow the tutorial on how to Build a connector to extract data from the Webflow API as a reference. Although the tutorial is specific to the Webflow API, the steps and concepts should be generalizable to a large number of API data sources, including the DOJ’s press release API that you’re interested in. The tutorial covers important topics such as authentication, pagination, dynamic stream generation, dynamic schema generation, connectivity checking, parsing of responses, and how to import a connector into the Airbyte UI. Once you’ve built and validated your connector, you can contribute it back to the Airbyte community by creating a pull request on the Airbyte Github repo. An Airbyte engineer will then work with you to get your connector merged into the master branch so that it can be distributed and used by others. If you need further assistance or have any questions, feel free to join the conversation on the Airbyte community Slack Channel or participate in discussions on Airbyte’s discourse. Good luck with your project!

:slightly_smiling_face: :slightly_smiling_face: I am also looking at https://docs.airbyte.com/connector-development/tutorials/cdk-speedrun :slightly_smiling_face:

Hi, <@U0245NZ4L7M> I could help you with building the building the source connector with Low code CDK.
The source API is open to public with no auth method right?

<@U0245NZ4L7M>
After some research, the endpoint is working fine with no authentication method and there are four endpoints available to act as source streams

I could definitely help you with building this source connector with Low Code CDK

<@U0245NZ4L7M>, hey there! I’m happy to help, for free, today and later this week. If you need help setting up your accont on Cloud or OSS, I’m happy to help too. If you already got started, great! If not, I’ll start building and give you a demo of the data that we can easily fetch later today.

hey there! did you figure this out?