Discussion:
[jira] [Created] (JCR-4369) Avoid S3 Incomplete Read Warning
Woonsan Ko (JIRA)
2018-09-04 17:53:00 UTC
Permalink
Woonsan Ko created JCR-4369:
-------------------------------

Summary: Avoid S3 Incomplete Read Warning
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Reporter: Woonsan Ko


While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-04 17:56:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Affects Version/s: 2.16.3
2.17.5
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-04 17:56:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657

was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-04 17:58:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657

was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-04 18:06:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657

was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by both HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-04 18:07:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657

was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3Backend}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-04 18:08:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657

was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-04 18:09:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657

was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3Backend}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just closing the input stream is not good enough in AWS SDK perspective, which gives a warning if the input stream was not fully read or not aborted explicitly. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-04 18:10:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657

was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully. \[2\]

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-04 18:14:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, some S3 related classes (such as {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}}, {{CachingDataStore#getStream(DataIdentifier)}}, etc.) should be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657

was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}} (used by {{CachingDataStore#getStream(DataIdentifier)}}) could be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the input stream.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.
The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-04 18:21:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, some S3 related classes (such as {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}}, {{CachingDataStore#getStream(DataIdentifier)}}, etc.) should be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.

The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657

was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, some S3 related classes (such as {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}}, {{CachingDataStore#getStream(DataIdentifier)}}, etc.) should be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.

The issue is a known one in AWS toolkit. \[2\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1657
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.
The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-05 13:00:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Description:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, some S3 related classes (such as {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}}, {{CachingDataStore#getStream(DataIdentifier)}}, etc.) should be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.

The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657

was:
While using S3DataStore, the following logs are observed occasionally:
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}

The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.

By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record.

After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).

Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]

Therefore, some S3 related classes (such as {{org.apache.jackrabbit.core.data.LocalCache#store(String, InputStream)}}, {{CachingDataStore#getStream(DataIdentifier)}}, etc.) should be improved like the following:
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.

The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.

\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.
The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-05 13:01:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604370#comment-16604370 ]

Woonsan Ko commented on JCR-4369:
---------------------------------

I'm preparing a pull request. Hope to provide one soon.
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.
The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
ASF GitHub Bot (JIRA)
2018-09-05 22:03:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604996#comment-16604996 ]

ASF GitHub Bot commented on JCR-4369:
-------------------------------------

GitHub user woonsan opened a pull request:

https://github.com/apache/jackrabbit/pull/61

JCR-4369: Avoid S3 Incomplete Read Warning by elegant aborting

AWS S3 SDK recommends to _abort_ the `S3ObjectInputStream` if not intended to consume the data because the http connection pool behavior (by consuming data on close to reuse the connection) of HttpClient under the hood could cause a big performance issue when reading big amount of data. By aborting, it's better to simply abort the underlying `HttpRequestBase` and kick out the connection from the pool from AWS S3 SDK perspective.

In multi-threaded working environment (due to multiple requests and/or `proactiveCaching` mode of `CachingDataStore`), the **reading** and **storing** actions in `o.a.j.c.data.CachingDataStore.getStream(DataIdentifier)` results in falling in the **else** block of `o.a.j.core.data.LocalCache.store(String, InputStream)` while the file by the name could already exist when executing the **else** block. In that case, the `S3ObjectInputStream` is never read and aborted. As a result, `com.amazonaws.services.s3.internal.S3AbortableInputStream#close()` ends up complaining about non-aborted/non-read-fully input stream.

Therefore, my fix includes the following:
- `LocalCache` checks if the backend resource input stream is **abortable**. If **abortable**, it tries to _abort_ the backend resource stream. For this purpose, `BackendResourceAbortable` interface in jackrabbit-data is introduced.
- `S3Backend` wraps the `S3ObjectInputStream` to implement `BackendResourceAbortable` by leveraging commons-io's `ProxyInputStream`.
- Some unit tests.
- Just FYI, also personally tested locally with S3 compatible system (ref: https://github.com/woonsanko/hippo-davstore-demo/tree/feature/vfs-file-system#option-4-using-the-aws-s3-datastore-instead-of-vfs-datastore)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/woonsan/jackrabbit feature/JCR-4369

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jackrabbit/pull/61.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #61

----
commit 77f5a97c494eaae1c4b4beaa5a65b16153748df1
Author: Woonsan Ko <***@...>
Date: 2018-09-04T22:32:47Z

JCR-4369: let LocalCache be able to abort if stream is never read.

commit 98c26cae2a13c31317fbd2df08cc23df8c57b8c7
Author: Woonsan Ko <***@...>
Date: 2018-09-05T19:43:58Z

JCR-4369: unit test for whether LocalCache aborts or not.

----
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.
The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-05 22:09:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Woonsan Ko updated JCR-4369:
----------------------------
Fix Version/s: 2.17.6
2.18
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Priority: Minor
Fix For: 2.18, 2.17.6
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.
The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Amit Jain (JIRA)
2018-09-06 12:12:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amit Jain reassigned JCR-4369:
------------------------------

Assignee: Amit Jain
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Assignee: Amit Jain
Priority: Minor
Fix For: 2.18, 2.17.6
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.
The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Amit Jain (JIRA)
2018-09-07 05:51:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amit Jain resolved JCR-4369.
----------------------------
Resolution: Fixed

Thanks [~woon_san].

Patch committed with [http://svn.apache.org/viewvc?view=revision&revision=1840272]

I changed the package versions to what were being suggested.
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Assignee: Amit Jain
Priority: Minor
Fix For: 2.18, 2.17.6
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.
The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Woonsan Ko (JIRA)
2018-09-07 14:29:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607177#comment-16607177 ]

Woonsan Ko commented on JCR-4369:
---------------------------------

Thank you very much! :-)

Cheers,

Woonsan
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Assignee: Amit Jain
Priority: Minor
Fix For: 2.18, 2.17.6
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.
The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Julian Reschke (JIRA)
2018-10-04 07:20:01 UTC
Permalink
[ https://issues.apache.org/jira/browse/JCR-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julian Reschke closed JCR-4369.
-------------------------------
Post by Woonsan Ko (JIRA)
Avoid S3 Incomplete Read Warning
--------------------------------
Key: JCR-4369
URL: https://issues.apache.org/jira/browse/JCR-4369
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-aws-ext
Affects Versions: 2.16.3, 2.17.5
Reporter: Woonsan Ko
Assignee: Amit Jain
Priority: Minor
Fix For: 2.18, 2.17.6
{noformat}
WARN [com.amazonaws.services.s3.internal.S3AbortableInputStream.close():178] Not all bytes were read from the S3ObjectInputStream,
aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged
GET or drain the input stream after use.
{noformat}
The warning logs are being left not only by HTTP processing threads, but also by background threads, which made me think of the possibility of some 'issues' in {{S3DataStore}} implementation. Not just caused by a broken http connection by client.
By the way, this issue is not a major one as AWS toolkit seems to just give a warning as _recommendation_ in that case, with closing the underlying HttpRequest object properly. So, there's no issue in functionality for the record. It's only about 'warning' message and possible sub-optimal http request handling under the hood (in AWS toolkit side).
After looking at the code, I noticed that {{CachingDataStore#proactiveCaching}} is enabled by default, which means the {{S3DataStore}} tries to _proactively_ download the binary content, asynchronously in a new thread, even when accessing metadata through {{#getLastModified(...) and #getLength(...).
Anyway, the _minor_ problem is now, whenever the {{S3DataStore}} reads content (in other words get an input stream on an {{S3Object}}, it is recommended to _read_ all data or _abort_ the input stream. Just to _close_ the input stream is not good enough in AWS SDK perspective, resulting in the warning. See {{S3AbortableInputStream#close()}} method. \[1\]
- If local cache file doesn't exist or it's on purge mode, it works as it does: Just copy everything to local cache file and close it.
- Otherwise, it should {{abort}} the underlying {{S3ObjectInputStream}}.
The issue is a known one in AWS toolkit. \[2,3\] It seems like clients using the toolkit needs to _abort_ the input stream if it doesn't want to read data fully.
\[1\] https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174-L187
\[2\] https://github.com/aws/aws-sdk-java/issues/1211
\[3\] https://github.com/aws/aws-sdk-java/issues/1657
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Loading...