Opened 8 years ago

Last modified 7 years ago

#8 new defect

Need a way to obtain Merkle proofs for a batch for certificates around the SCT timestamp

Reported by: eranm@… Owned by: eranm@…
Priority: major Milestone:
Component: client-privacy Version:
Severity: - Keywords:
Cc:

Description

Migrated from https://code.google.com/p/certificate-transparency/issues/detail?id=33.
From robstrad:

Section 7.3 says:

"In order to protect the clients' privacy, these checks need not

reveal the exact certificate to the log. Clients can instead
...
request Merkle proofs for a batch of certificates around the SCT
timestamp."

Section 3 specifies two mechanisms to lookup a single proof - get-proof-by-hash and get-entry-and-proof - but there's no mechanism defined to get "proofs for a batch of certificates around the SCT timestamp."

Presumably TLS Clients that audit (see Issue 25 ) and that want to protect clients' privacy will need a mechanism to do this!

Suggested text...

"4.9. Retrieve Merkle Audit Proofs from Log by Timestamp Range

GET https://<log server>/ct/v1/get-proofs-by-timestamp-range

Inputs:

timestamp_1: The earliest timestamp.

timestamp_2: The latest timestamp.

tree_size: The tree_size of the tree for which the proofs are

desired.

The tree size must designate an existing STH.

Outputs:

entries: An array of objects (one object for each entry added to the Merkle Tree

between timestamp_1 and timestamp_2), each consisting of

leaf_index: The 0-based index of the entry.

audit_path: An array of base64-encoded Merkle Tree nodes proving

the inclusion of the certificate."

IIUC, the Client would be able to locate the proof it is interested in by looking at the zeroth node in each entry's "audit_path" until it finds the node it is expecting to see.

Change History (10)

comment:1 in reply to: ↑ description Changed 8 years ago by rob.stradling@…

Replying to eranm@…:

Migrated from https://code.google.com/p/certificate-transparency/issues/detail?id=33.
From robstrad:

<snip>

Presumably TLS Clients that audit

...that audit SCTs. (Auditing STHs doesn't cause privacy concerns, I think)

(see Issue 25 ) and that want to protect clients' privacy will need a mechanism to do this!

It's Issue #3 on this issue tracker. (It was Issue 25 on the previous issue tracker).

comment:2 Changed 8 years ago by rob.stradling@…

Possible alternative idea: Provide a DNS-based equivalent of get-proof-by-hash.

Leaking the precise certificate of interest to the recursive resolver seems no more privacy invasive than the A/AAAA lookup the Client would've already performed on the precise domain name of interest.

Also, given the inherently distributed nature of DNS, this approach would mean that Log servers wouldn't have to worry about scalability so much, since they wouldn't be servicing requests from billion(s) of TLS Clients.

comment:3 Changed 7 years ago by benl@…

Decision: we will provide a DNS alternative, but it is not a perfect solution.

Add an API to allow fetching of multiple entries in a single call (whether by timestamp or index TBD).

comment:4 Changed 7 years ago by eranm@…

  • Owner changed from draft-ietf-trans-rfc6962-bis@… to eranm@…

comment:5 Changed 7 years ago by eranm@…

After a short brainstorm with Al:

  • The issue with this approach is that, if only one certificate was added to the log in this time span, then the client effectively discloses to the log which certificate it was interested in.
  • What we're trying to avoid is disclosing the client's browsing habits to the log - we ignore attacks where the client's communication is sniffed or several logs collude to expose a client's identity (by comparing requests from the same IP).
  • Another problem is that we'd like to limit the number of proofs returned by the log server (similar to get-entries) but if the log arbitrarily cuts off the number of proofs it returns then the client may not get the proof for the certificate it's interested in.
  • An alternative approach might be requesting certificates whose hash value modulo some number equals some value. That has two problems: (1) for logs with very few certificates, the result could still be a single certificate. (2) for logs with a lot of certificates, this result could be a large number of certificates and the client has no way to request the "next range" of certificates that match this criteria. (also, this is a lot of work for the logs)
  • For both problems, the client can use the STH to figure out how many certificates are in the log and (assuming uniform distribution of certificate hashes) estimate how many certificates are likely to be returned by the log. This implies the client should choose tho modulo and the value that it sends to the log server.
  • We will seek expert advice on the relationship between number of proofs fetched / method used to fetch them and privacy properties.

comment:6 Changed 7 years ago by benl@…

This is not actually particularly hard, but you do have to be a bit careful - essentially you choose a time window (that is initially very wide) and ask how many certs are in that window. You then repeatedly reduce the window until the desired number of certs (roughly) are in the window, and that's what you fetch.

You could do the same thing with hash truncation, too (I kinda like that idea).

comment:7 Changed 7 years ago by tom@…

On the "reducing window" solution, what stops the log server returning lies on calls to get-num-of-certs-in-window? The log server's incentive to lie would be to cause the client to reveal as small a window as possible.

Clients would be able to detect:

  • if the log server claimed a window contained more inclusions proofs than the log server could demonstrate via get-proofs-by-timestamp-range, and it then called get-proofs-by-timestamp-range (which the client would do, when the window was sufficiently small/large)

Responses to get-proofs-by-timestamp-range would be have to signed, for this misbehaviour to be accountable.

Clients would not be able to detect:

  • if the log server claimed a window contained more inclusions proofs than the log server could demonstrate via get-proofs-by-timestamp-range, but then didn't check by calling get-proofs-by-timestamp-range (why would log servers lie in this case? See the attack below)

All responses to get-proofs-by-timestamp-range would have to be gossiped to monitors, for this misbehaviour to be accountable.

Another "attack" would be for log servers to infer which inclusion proof the client was interested in, based on various factors: domain popularity, similarity between domains and linking with previous requests (if I request one window containing chocolate-cookies.com, and then another window containing vanilla-cookies.com, you can bet I'm into cookies), IP of client, etc.

The above two attacks could be combined:

  • receive request to get-num-of-certs-in-window with range=[0,1000], asking for time window that contains 98 of unpopular domains, and two popular domains: cookies-are-tasty.com at t=250 and cookies-are-horrible.com at t=750
  • Want to infer whether user thinks cookies are tasty or not
  • Suppose the client is willing to download 100 proofs, but not 200
  • Lie, claiming there are 200 results.
  • Client asks for range=[0,50]
  • Infer user wants cookies-are-tasty.com and hence likes cookies

Still, perfect is the enemy of good. The current proposal is an improvement on the status quo.

(To any log operators out there, I'll spare you the effort, cookies are tasty!)

Last edited 7 years ago by tom@… (previous) (diff)

comment:8 Changed 7 years ago by benl@…

http://trac.tools.ietf.org/wg/trans/trac/ticket/8#comment:7 is a fair point.

Possibly worth noting that if we fix the reducing window sizes and start points, we can pre-sign the responses to reduce load.

comment:9 Changed 7 years ago by eranm@…

Note:
We should punt this to a separate document. This is a big change and it's yet unclear how good it is as a privacy-preserving proof retrieval mechanism.
In the separate document we could explore (in addition to DNS), Private information retrieval.

comment:10 Changed 7 years ago by benl@…

  • Component changed from rfc6962-bis to client-privacy
Note: See TracTickets for help on using tickets.