S3 to Postgres - read 0 records

  • Is this your first time deploying Airbyte?: Yes
  • OS Version/Instance: MacOS Monterey 12.5.1
  • Memory/Disk: 8GB RAM/ 250GB Disk
  • Deployment: Docker (run-ab-platform.sh)
  • Airbyte Version: 0.44.5
  • Source name/version: S3 (airbyte/source-s3:2.2.0)
  • Destination name/version: PostgreSQL (airbyte/destination-postgres:0.3.27)
  • Step: During sync
  • Description:

I have AWS Cost and Usage Reports being stored in a S3 bucket. I have setup Airbyte with a S3 source with all configurations (Access keys, patterns, and bucket name) and the PostgreSQL destination with all configurations. While both of the connections work fine, and the S3 user has the correct privileges to access S3, the sync doesn’t work.

The S3 connector fails to read any records present in the AWS cost usage report. A thing to note here is that the AWS CUR is present in this path daily-v1/<date_start>-<date_end>/<reportname>-uuid.csv.gz. Since Airbyte supports gzip archives, I believe this shouldn’t be problem.

I have also tried to give more permissions to the AWS user meant for airbyte, and I’ve even tried manually uploading the CSV to the bucket, so that Airbyte can pick it up easily without the hassle of dealing with archives.

I’m not sure why it doesn’t pick it up, as the connectors do work fine. PostgreSQL creates two new tables _airbyte_raw_table_name and table_name, however, none of them have any rows, those are purely empty tables. I do believe that the problem arises from the S3 connector. Here are some logs which might help:

WriteConfig{streamName=aws_costs, namespace=null, outputSchemaName=public, tmpTableName=_airbyte_tmp_xrs_aws_costs, outputTableName=_airbyte_raw_aws_costs, syncMode=overwrite}
2023-05-31 10:22:25 e[43mdestinatione[0m > INFO i.a.i.d.b.BufferedStreamConsumer(startTracked):146 class io.airbyte.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer started.
2023-05-31 10:22:25 e[43mdestinatione[0m > INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):134 Preparing raw tables in destination started for 1 streams
2023-05-31 10:22:25 e[43mdestinatione[0m > INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):139 Preparing raw table in destination started for stream aws_costs. schema: public, table name: _airbyte_raw_aws_costs
2023-05-31 10:22:26 e[44msourcee[0m > Read 0 records from aws_costs stream
2023-05-31 10:22:26 e[44msourcee[0m > Marking stream aws_costs as STOPPED
2023-05-31 10:22:26 e[44msourcee[0m > {"type": "TRACE", "trace": {"type": "STREAM_STATUS", "emitted_at": 1685528546513.658, "stream_status": {"stream_descriptor": {"name": "aws_costs", "namespace": null}, "status": "COMPLETE"}}}
2023-05-31 10:22:26 e[44msourcee[0m > Finished syncing aws_costs
2023-05-31 10:22:26 e[44msourcee[0m > SourceS3 runtimes:
Syncing stream aws_costs 0:00:01.629692
2023-05-31 10:22:26 e[44msourcee[0m > Finished syncing SourceS3
2023-05-31 10:22:26 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$5):361 - Source has no more messages, closing connection.
2023-05-31 10:22:26 e[32mINFOe[m i.a.w.g.ReplicationWorkerHelper(endOfSource):62 - Total records read: 0 (0 bytes)
2023-05-31 10:22:26 e[32mINFOe[m i.a.w.i.FieldSelector(reportMetrics):122 - Schema validation was performed to a max of 10 records with errors per stream.
2023-05-31 10:22:26 e[32mINFOe[m i.a.w.i.HeartbeatTimeoutChaperone(runWithHeartbeatThread):111 - thread status... heartbeat thread: false , replication thread: true
2023-05-31 10:22:26 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(replicate):248 - Waiting for source and destination threads to complete.
2023-05-31 10:22:26 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(replicate):253 - One of source or destination thread complete. Waiting on the other.
2023-05-31 10:22:29 e[43mdestinatione[0m > INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):152 Preparing raw tables in destination completed.
2023-05-31 10:22:29 e[43mdestinatione[0m > INFO i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):80 Airbyte message consumer: succeeded.
2023-05-31 10:22:29 e[43mdestinatione[0m > INFO i.a.i.d.b.BufferedStreamConsumer(close):257 executing on success close procedure.
2023-05-31 10:22:29 e[43mdestinatione[0m > INFO i.a.i.d.j.SqlOperations(onDestinationCloseOperations):154 No onDestinationCloseOperations required for this destination.
2023-05-31 10:22:29 e[43mdestinatione[0m > INFO i.a.i.b.IntegrationRunner(runInternal):186 Completed integration: io.airbyte.integrations.base.ssh.SshWrappedDestination
2023-05-31 10:22:29 e[43mdestinatione[0m > INFO i.a.i.d.p.PostgresDestination(main):100 completed destination: class io.airbyte.integrations.destination.postgres.PostgresDestination
2023-05-31 10:22:30 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(replicate):255 - Source and destination threads complete.
2023-05-31 10:22:30 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(getReplicationOutput):449 - sync summary: {
  "status" : "completed",
  "recordsSynced" : 0,
  "bytesSynced" : 0,
  "startTime" : 1685528542831,
  "endTime" : 1685528550088,
  "totalStats" : {
    "bytesCommitted" : 0,
    "bytesEmitted" : 0,
    "destinationStateMessagesEmitted" : 0,
    "destinationWriteEndTime" : 1685528550073,
    "destinationWriteStartTime" : 1685528543051,
    "meanSecondsBeforeSourceStateMessageEmitted" : 0,
    "maxSecondsBeforeSourceStateMessageEmitted" : 0,
    "maxSecondsBetweenStateMessageEmittedandCommitted" : 0,
    "meanSecondsBetweenStateMessageEmittedandCommitted" : 0,
    "recordsEmitted" : 0,
    "recordsCommitted" : 0,
    "replicationEndTime" : 1685528550084,
    "replicationStartTime" : 1685528542831,
    "sourceReadEndTime" : 1685528546861,
    "sourceReadStartTime" : 1685528542967,
    "sourceStateMessagesEmitted" : 0
  },
  "streamStats" : [ ]
}
2023-05-31 10:22:30 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(getReplicationOutput):450 - failures: [ ]
2023-05-31 10:22:30 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(get):163 - Stopping cancellation check scheduling...
2023-05-31 10:22:30 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):149 - 
2023-05-31 10:22:30 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):149 - ----- END REPLICATION -----
2023-05-31 10:22:30 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):149 - 
2023-05-31 10:22:30 e[32mINFOe[m i.a.w.t.s.ReplicationActivityImpl(lambda$replicate$3):159 - sync summary: