{"sourceUrl":"https://www.hellointerview.com/learn/system-design/problem-breakdowns/dropbox","sourceType":"url","contentType":"Explainer","apex":{"label":"Dropbox System Design","children":[{"type":"CONC","parentId":"n1","text":"Dropbox is a cloud-based file storage service for storing and sharing files securely and reliably across devices.","id":"n2","label":"Problem Understanding","children":[]},{"label":"Functional Requirements","children":[{"children":[],"label":"Core: Upload Files","id":"n4","parentId":"n3","text":"Users should be able to upload a file from any device.","type":"DETL"},{"label":"Core: Download Files","children":[],"type":"DETL","parentId":"n3","text":"Users should be able to download a file from any device.","id":"n5"},{"children":[],"label":"Core: Share Files","type":"DETL","id":"n6","parentId":"n3","text":"Users should be able to share files with others and view shared files."},{"label":"Core: Sync Files","children":[],"type":"DETL","parentId":"n3","text":"Users can automatically sync files across devices.","id":"n7"},{"children":[],"label":"Out of Scope: Edit Files","type":"DETL","id":"n8","parentId":"n3","text":"Users should not be able to edit files directly within the system."},{"label":"Out of Scope: View Without Download","children":[],"type":"DETL","parentId":"n3","text":"Users should not be able to view files without downloading them first.","id":"n9"},{"label":"Blob Storage Design Out of Scope","children":[],"type":"INSG","parentId":"n3","text":"Designing Blob Storage itself is outside the scope of this problem, but researching it is suggested.","id":"n10"}],"type":"CONC","parentId":"n1","text":"Core functional requirements for the Dropbox system are defined, alongside out-of-scope items.","id":"n3"},{"type":"CONC","parentId":"n1","text":"Key non-functional requirements for the system, including availability, latency, security, and reliability, are outlined.","id":"n11","label":"Non-Functional Requirements","children":[{"id":"n12","parentId":"n11","text":"The system should prioritize availability over consistency.","type":"DETL","children":[],"label":"Core: High Availability"},{"parentId":"n11","text":"The system should support files as large as 50GB.","id":"n13","type":"DETL","label":"Core: Large File Support","children":[]},{"parentId":"n11","text":"The system should be secure, reliable, and able to recover lost or corrupted files.","id":"n14","type":"DETL","label":"Core: Security and Reliability","children":[]},{"parentId":"n11","text":"Upload, download, and sync times should be as fast as possible.","id":"n15","type":"DETL","label":"Core: Low Latency","children":[]},{"type":"DETL","parentId":"n11","text":"The system should not have a storage limit per user.","id":"n16","label":"Out of Scope: Storage Limit","children":[]},{"type":"DETL","parentId":"n11","text":"The system should not support file versioning.","id":"n17","label":"Out of Scope: File Versioning","children":[]},{"children":[],"label":"Out of Scope: Virus Scanning","type":"DETL","id":"n18","parentId":"n11","text":"The system should not scan files for viruses and malware."},{"children":[{"parentId":"n19","text":"A stock trading app requires consistency, meaning a buy transaction must be replicated globally before subsequent buys.","id":"n20","type":"EXMP","label":"Stock Trading App Consistency","children":[]},{"parentId":"n19","text":"For Dropbox, it is acceptable if an uploaded file is not immediately visible globally for a few seconds.","id":"n21","type":"EXMP","label":"Dropbox Eventual Consistency","children":[]}],"label":"CAP Theorem Trade-off","id":"n19","parentId":"n11","text":"For file storage, prioritizing availability over consistency is acceptable, unlike applications requiring immediate consistency.","type":"INSG"}]},{"parentId":"n1","text":"The initial setup involves planning the design approach and defining core entities for the system.","id":"n22","type":"CONC","label":"System Set Up","children":[{"parentId":"n22","text":"The design strategy involves building sequentially through functional requirements, then using non-functional requirements for deep dives.","id":"n23","type":"DCSN","label":"Planning Approach","children":[]},{"id":"n24","parentId":"n22","text":"Defining primary entities early provides a foundation for the system's API and high-level design.","type":"CONC","children":[{"children":[],"label":"File Entity","type":"DETL","id":"n25","parentId":"n24","text":"The File entity represents the raw data that users upload, download, and share."},{"children":[],"label":"FileMetadata Entity","type":"DETL","id":"n26","parentId":"n24","text":"FileMetadata includes information like file name, size, mime type, and uploader."},{"children":[],"label":"User Entity","type":"DETL","id":"n27","parentId":"n24","text":"The User entity represents the system's users."}],"label":"Core Entities Definition"},{"label":"API or System Interface","children":[{"id":"n29","parentId":"n28","text":"An initial endpoint for uploading a file might be POST /files with File and FileMetadata in the request.","type":"DETL","children":[],"label":"Upload API Endpoint"},{"type":"DETL","parentId":"n28","text":"An initial endpoint for downloading a file can be GET /files/{fileId} returning File & FileMetadata.","id":"n30","label":"Download API Endpoint","children":[]},{"label":"Share API Endpoint","children":[],"type":"DETL","parentId":"n28","text":"An initial endpoint for sharing a file might be POST /files/{fileId}/share with an array of User IDs.","id":"n31"},{"type":"DETL","id":"n32","parentId":"n28","text":"An endpoint to query changes for syncing can be GET /files/changes?since={timestamp} returning ChangeEvent[].","children":[],"label":"Sync API Endpoint"},{"children":[],"label":"ChangeEvent Details","id":"n33","parentId":"n28","text":"Each ChangeEvent includes fileId, change type (created, updated, deleted), and updated metadata.","type":"DETL"},{"label":"API Evolution Expectation","children":[],"type":"INSG","parentId":"n28","text":"APIs may change or evolve during the design process, which should be communicated to the interviewer.","id":"n34"},{"children":[],"label":"User Info in Headers","id":"n35","parentId":"n28","text":"User authentication information (session token or JWT) should be passed in request headers for security.","type":"DETL"},{"label":"Avoid User Info in Body","children":[],"type":"JUST","parentId":"n28","text":"Passing user information in the request body should be avoided as it can be manipulated by the client.","id":"n36"}],"type":"CONC","parentId":"n22","text":"Defining the API early guides the high-level design, with endpoints for each functional requirement.","id":"n28"}]},{"type":"CONC","id":"n37","parentId":"n1","text":"The high-level design aims to satisfy all functional requirements first, then layer in non-functional requirements.","children":[{"children":[{"label":"Metadata Storage","children":[],"type":"DETL","parentId":"n38","text":"File metadata can be stored in a NoSQL database like DynamoDB, which supports loosely structured data.","id":"n39"},{"type":"DETL","parentId":"n38","text":"A basic schema includes id, name, size, mimeType, and uploadedBy fields.","id":"n40","label":"FileMetadata Schema Example","children":[]},{"children":[{"type":"CONC","parentId":"n41","text":"This approach has scalability and reliability issues as file numbers grow and server failures occur.","id":"n42","label":"Challenges: Direct to Backend","children":[]}],"label":"Approach 1: Direct to Backend","type":"CONC","id":"n41","parentId":"n38","text":"The simplest approach is uploading files directly to a File Service backend server and storing them on its local file system."},{"children":[{"children":[],"label":"Blob Storage Benefits","id":"n44","parentId":"n43","text":"Blob Storage handles scaling, offers high reliability, and provides features like lifecycle policies and versioning.","type":"JUST"},{"children":[],"label":"Challenges: Store in Blob Storage","id":"n45","parentId":"n43","text":"This approach is more complex, requiring integration with Blob Storage and handling transactional consistency between file and metadata uploads.","type":"CONC"},{"id":"n46","parentId":"n43","text":"This approach redundantly uploads files twice: once to the backend and once to Blob Storage.","type":"DETL","children":[],"label":"Double Upload Redundancy"}],"label":"Approach 2: Store in Blob Storage","id":"n43","parentId":"n38","text":"A better approach is storing files in a Blob Storage service (e.g., Amazon S3, Google Cloud Storage) while metadata goes to the database.","type":"CONC"},{"children":[{"parentId":"n47","text":"Direct upload is faster and cheaper, bypassing the backend server for file transfer.","id":"n48","type":"JUST","label":"Direct Upload Benefits","children":[]},{"parentId":"n47","text":"Presigned URLs grant temporary permission to upload a file to a specific Blob Storage location.","id":"n49","type":"DETL","label":"Presigned URL Purpose","children":[]},{"label":"Three-Step Upload Process","children":[],"parentId":"n47","text":"The upload process becomes a three-step sequence involving requesting a URL, uploading, and notification.","id":"n50","type":"DETL"},{"label":"Step 1: Request Presigned URL","children":[],"type":"DETL","parentId":"n47","text":"Client requests a presigned URL from the backend, which saves file metadata with 'uploading' status.","id":"n51"},{"type":"DETL","parentId":"n47","text":"Client uses the presigned URL for a PUT request to upload the file directly to Blob Storage.","id":"n52","label":"Step 2: Upload to Blob Storage","children":[]},{"parentId":"n47","text":"Blob Storage sends a notification to the backend, which updates file metadata status to 'uploaded'.","id":"n53","type":"DETL","label":"Step 3: Update Metadata","children":[]},{"type":"INSG","parentId":"n38","text":"Direct upload with presigned URLs is a classic pattern for efficient large file transfers, bypassing application servers for data transfer.","id":"n54","label":"Pattern: Handling Large Blobs","children":[]}],"label":"Approach 3: Direct Upload to Blob Storage","type":"CONC","id":"n47","parentId":"n38","text":"The best approach allows users to upload files directly to Blob Storage from the client using presigned URLs."}],"label":"File Upload Design","id":"n38","parentId":"n37","text":"Designing how users upload files from any device involves storing file contents and metadata.","type":"CONC"},{"label":"File Download Design","children":[{"parentId":"n55","text":"The most common solution involves downloading the file first from Blob Storage to the backend, then to the client.","id":"n56","type":"CONC","label":"Approach 1: Via Backend","children":[{"type":"CONC","id":"n57","parentId":"n56","text":"This approach is suboptimal, leading to slower speeds and increased costs due to double downloads.","children":[],"label":"Challenges: Via Backend"}]},{"label":"Approach 2: Direct from Blob Storage","children":[{"type":"DETL","parentId":"n58","text":"Client requests a presigned download URL from the backend, then uses it to download the file directly.","id":"n59","label":"Presigned URL Download Process","children":[]},{"children":[],"label":"Challenges: Direct from Blob Storage","type":"CONC","id":"n60","parentId":"n58","text":"While nearly optimal, this approach can be slow for geographically distributed users due to single-region Blob Storage."}],"type":"CONC","parentId":"n55","text":"A better approach is allowing users to download files directly from Blob Storage using presigned URLs.","id":"n58"},{"type":"CONC","id":"n61","parentId":"n55","text":"The best approach uses a CDN to cache files closer to users, reducing latency and speeding up downloads.","children":[{"type":"JUST","parentId":"n61","text":"CDNs serve files from the closest server, significantly faster than direct backend or Blob Storage access.","id":"n62","label":"CDN Benefits","children":[]},{"children":[],"label":"CDN Signed URLs","type":"DETL","id":"n63","parentId":"n61","text":"For security, CDN signed URLs provide temporary, permission-based access for file downloads."},{"label":"Challenges: CDN Cost Management","children":[],"parentId":"n61","text":"CDNs are expensive, requiring strategic caching policies for file caching duration and invalidation.","id":"n64","type":"CONC"},{"id":"n65","parentId":"n61","text":"Cache control headers specify how long files should be cached, optimizing cost and performance.","type":"DETL","children":[],"label":"Cache Control Headers"},{"type":"DETL","id":"n66","parentId":"n61","text":"Cache invalidation removes updated or deleted files from the CDN to ensure fresh content.","children":[],"label":"Cache Invalidation"}],"label":"Approach 3: Via Content Delivery Network (CDN)"}],"type":"CONC","parentId":"n37","text":"Designing how users download files from any device involves several approaches.","id":"n55"},{"label":"File Sharing Design","children":[{"children":[{"parentId":"n68","text":"The file metadata schema would include a 'sharelist' field, e.g., 'sharelist': ['user2', 'user3'].","id":"n69","type":"DETL","label":"Metadata Sharelist Example","children":[]},{"label":"Challenges: Sharelist Query Performance","children":[],"type":"CONC","parentId":"n68","text":"Retrieving files shared *with* a user is slow, requiring scanning every file's sharelist.","id":"n70"}],"label":"Approach 1: Sharelist in Metadata","id":"n68","parentId":"n67","text":"A simple approach is adding a list of users with access (sharelist) directly to the file metadata.","type":"CONC"},{"type":"CONC","parentId":"n67","text":"A better approach caches an inverse mapping from a user to the files shared with them, in addition to the sharelist.","id":"n71","label":"Approach 2: Cached Inverse Mapping","children":[{"parentId":"n71","text":"A cache entry would look like 'user1': ['fileId1', 'fileId2'] for quick lookup.","id":"n72","type":"DETL","label":"Cache Entry Example","children":[]},{"children":[],"label":"Challenges: Cache Sync","id":"n73","parentId":"n71","text":"The main challenge is keeping the cached sharedFiles list in sync with the sharelist in the file metadata.","type":"CONC"},{"type":"DCSN","id":"n74","parentId":"n71","text":"The best way to overcome sync issues is updating both sharelist and sharedFiles list within a transaction.","children":[],"label":"Sync Solution: Transactional Update"}]},{"type":"CONC","id":"n75","parentId":"n67","text":"Another approach fully normalizes data by creating a new SharedFiles table mapping userId to fileId.","children":[{"children":[],"label":"SharedFiles Table Schema","id":"n76","parentId":"n75","text":"The SharedFiles table has 'userId' (Partition Key) and 'fileId' (Sort Key) forming a composite primary key.","type":"DETL"},{"children":[],"label":"Eliminate Sharelist Sync","id":"n77","parentId":"n75","text":"This design removes the need for a 'sharelist' in file metadata and eliminates sync issues.","type":"JUST"},{"parentId":"n75","text":"This query is slightly less efficient due to index-based querying instead of a simple key-value lookup.","id":"n78","type":"CONC","label":"Challenges: Query Efficiency","children":[]},{"label":"Tradeoff: Sync vs Query","children":[],"parentId":"n75","text":"The trade-off of slightly less efficient queries is often worth eliminating the need to sync sharelists.","id":"n79","type":"JUST"}],"label":"Approach 3: Normalized Data Table"}],"parentId":"n37","text":"To support file sharing, the system needs an efficient mechanism to manage access for other users.","id":"n67","type":"CONC"},{"parentId":"n37","text":"Automatic file synchronization across devices requires handling changes from local to remote and remote to local.","id":"n80","type":"CONC","label":"File Sync Design","children":[{"id":"n81","parentId":"n80","text":"When a user updates a file locally, changes must sync to the remote server, considered the source of truth.","type":"CONC","children":[{"type":"DETL","id":"n82","parentId":"n81","text":"A client-side sync agent monitors local folder changes using OS-specific file system events.","children":[],"label":"Client-Side Sync Agent"},{"type":"DETL","id":"n83","parentId":"n81","text":"Upon detecting a change, the agent queues the modified file for local upload.","children":[],"label":"Upload Queue"},{"id":"n84","parentId":"n81","text":"The agent uses the upload API to send changes and updated metadata to the server.","type":"DETL","children":[],"label":"Upload API Usage"},{"label":"Conflict Resolution Strategy","children":[],"parentId":"n81","text":"Conflicts are resolved using a 'last write wins' strategy, saving the most recent edit.","id":"n85","type":"DETL"},{"type":"INSG","id":"n86","parentId":"n81","text":"Versioning, though out of scope, would typically add new chunks/files rather than overwriting the only file.","children":[],"label":"Versioning for Overwriting"}],"label":"Local to Remote Sync"},{"label":"Remote to Local Sync","children":[{"children":[{"label":"Polling Challenges","children":[],"type":"CONC","parentId":"n88","text":"Polling is simple but can be slow to detect changes and wastes bandwidth if nothing has changed.","id":"n89"}],"label":"Approach 1: Polling","type":"CONC","id":"n88","parentId":"n87","text":"The client periodically queries the server for changes since its last sync, using `updatedAt` timestamps."},{"type":"CONC","parentId":"n87","text":"The server maintains an open connection (WebSocket or SSE) with each client to push real-time change notifications.","id":"n90","label":"Approach 2: WebSocket or SSE","children":[{"children":[],"label":"WebSocket/SSE Challenges","id":"n91","parentId":"n90","text":"This approach is more complex but provides real-time updates.","type":"CONC"}]},{"type":"CONC","parentId":"n87","text":"A hybrid approach combines WebSocket/SSE for real-time updates with periodic polling as a safety net.","id":"n92","label":"Hybrid Sync Approach","children":[{"type":"DETL","id":"n93","parentId":"n92","text":"The server pushes change events in real-time through a single WebSocket connection per device/session.","children":[],"label":"Active Notification via WebSocket"},{"children":[],"label":"Periodic Polling Safety Net","type":"DETL","id":"n94","parentId":"n92","text":"Clients periodically poll (e.g., every few minutes) using GET /files/changes?since={timestamp} to catch missed changes."},{"label":"Hybrid Approach Benefits","children":[],"type":"JUST","parentId":"n92","text":"This approach provides real-time updates and guarantees eventual consistency even with connection interruptions.","id":"n95"}]}],"type":"CONC","parentId":"n80","text":"Clients need to detect and pull changes from the remote server to their local devices.","id":"n87"}]}],"label":"High-Level Design"},{"label":"Tying It All Together: Final System","children":[{"type":"DETL","id":"n97","parentId":"n96","text":"The client (web, mobile, or desktop app) uploads files and proactively identifies and pushes local changes.","children":[],"label":"Uploader Component"},{"parentId":"n96","text":"The client (potentially same as uploader) downloads files and determines when local files need remote updates.","id":"n98","type":"DETL","label":"Downloader Component","children":[]},{"children":[],"label":"LB & API Gateway","type":"DETL","id":"n99","parentId":"n96","text":"Handles routing requests, SSL termination, rate limiting, and request validation for application servers."},{"children":[],"label":"File Service","id":"n100","parentId":"n96","text":"Manages file metadata in the database and generates presigned URLs using the S3 SDK without direct file handling.","type":"DETL"},{"type":"DETL","id":"n101","parentId":"n96","text":"Stores file metadata (name, size, MIME type, uploader) and a shared files table for permissions enforcement.","children":[],"label":"File Metadata DB"},{"children":[],"label":"S3 (Blob Storage)","id":"n102","parentId":"n96","text":"Stores actual file contents, with direct uploads facilitated by presigned URLs from the file service.","type":"DETL"},{"label":"CDN (CloudFront)","children":[],"parentId":"n96","text":"Caches files globally to reduce latency; serves files from the nearest edge location using signed URLs.","id":"n103","type":"DETL"},{"type":"DETL","parentId":"n96","text":"CDN fetches files from S3 on a cache miss and serves from the edge on subsequent requests.","id":"n104","label":"CDN Fetch Process","children":[]}],"parentId":"n1","text":"A holistic view of the system components satisfying all functional requirements.","id":"n96","type":"CONC"},{"type":"CONC","parentId":"n1","text":"This section explores specific challenges and advanced solutions for the Dropbox system design.","id":"n105","label":"Potential Deep Dives","children":[{"label":"Support for Large Files","children":[{"type":"INSG","parentId":"n106","text":"Key user experience insights for large files include progress indicators and resumable uploads.","id":"n107","label":"User Experience Insights","children":[]},{"children":[{"label":"Timeout Issues","children":[{"parentId":"n109","text":"A 50GB file at 100Mbps takes approximately 1.11 hours to upload.","id":"n110","type":"STAT","label":"50GB Upload Time Calculation","children":[]}],"type":"DETL","parentId":"n108","text":"Web servers and clients have timeout settings, which a 50GB file upload can easily exceed.","id":"n109"},{"children":[{"parentId":"n111","text":"Amazon API Gateway has a hard limit of 10MB for request payloads.","id":"n112","type":"STAT","label":"API Gateway Size Limit","children":[]}],"label":"Browser and Server Limitations","type":"DETL","id":"n111","parentId":"n108","text":"Browsers and web servers, like Amazon API Gateway, often impose strict limits on request payload sizes."},{"type":"DETL","id":"n113","parentId":"n108","text":"Large files are more susceptible to network interruptions, forcing uploads to restart from scratch.","children":[],"label":"Network Interruptions"},{"children":[],"label":"Poor User Experience","id":"n114","parentId":"n108","text":"Users lack progress visibility, not knowing upload status or estimated completion time.","type":"DETL"}],"label":"Limitations of Single POST Request","id":"n108","parentId":"n106","text":"Uploading large files via a single POST request faces several limitations.","type":"CONC"},{"type":"CONC","parentId":"n106","text":"Chunking breaks files into smaller pieces (e.g., 5-10 MB) for individual or parallel uploads.","id":"n115","label":"Chunking for Large Files","children":[{"type":"JUST","parentId":"n115","text":"Chunking must be done on the client side to effectively bypass server payload limitations.","id":"n116","label":"Client-Side Chunking","children":[]},{"children":[],"label":"Progress Indicator with Chunking","type":"DETL","id":"n117","parentId":"n115","text":"Chunking allows tracking and updating a progress bar for each successfully uploaded chunk, improving UX."}]},{"children":[{"label":"FileMetadata Chunks Field","children":[],"parentId":"n118","text":"The FileMetadata schema includes a 'chunks' field, listing each chunk's ID and status (uploaded, uploading, not-uploaded).","id":"n119","type":"DETL"},{"label":"Chunk Status Sync Approach 1: Client Orchestration","children":[{"type":"CONC","id":"n121","parentId":"n120","text":"This approach risks security and inconsistent states as a malicious client could fake chunk upload statuses.","children":[],"label":"Challenges: Client Orchestration Security"}],"parentId":"n118","text":"The client uploads chunks to S3, then sends PATCH requests to the backend to update chunk statuses in FileMetadata.","id":"n120","type":"CONC"},{"type":"CONC","id":"n122","parentId":"n118","text":"A better approach implements server-side verification of chunk uploads using ETags and S3's ListParts API.","children":[{"parentId":"n122","text":"This approach balances user experience with data integrity by accepting client updates but periodically verifying server-side.","id":"n123","type":"JUST","label":"Trust but Verify Principle","children":[]}],"label":"Chunk Status Sync Approach 2: Server-Side Verification"}],"label":"Resumable Uploads with Chunking","id":"n118","parentId":"n106","text":"Resumable uploads require tracking uploaded and remaining chunks, saving state in FileMetadata.","type":"CONC"},{"type":"CONC","parentId":"n106","text":"Resuming uploads requires uniquely identifying files and individual chunks.","id":"n124","label":"Unique File and Chunk Identification","children":[{"type":"DETL","parentId":"n124","text":"A fingerprint (cryptographic hash like SHA-256) identifies file content for deduplication and resumability.","id":"n125","label":"File Fingerprinting","children":[]},{"type":"DETL","parentId":"n124","text":"Generating fingerprints for each chunk allows precise identification of transmitted parts for resumable uploads.","id":"n126","label":"Chunk-Level Fingerprinting","children":[]}]},{"children":[{"parentId":"n127","text":"The client chunks the file into 5-10MB pieces, calculating fingerprints for each chunk and the entire file.","id":"n128","type":"DETL","label":"Step 1: Client Chunking & Fingerprinting","children":[]},{"children":[],"label":"Step 2: Check for Existing File","type":"DETL","id":"n129","parentId":"n127","text":"Client checks if a file with the same fingerprint exists and is 'uploading' to resume the upload."},{"type":"DETL","id":"n130","parentId":"n127","text":"If new, client POSTs to initiate a multipart upload; backend gets an S3 uploadId, generates chunk presigned URLs, and saves metadata.","children":[],"label":"Step 3: Initiate Multipart Upload"},{"children":[],"label":"Step 4: Upload Chunks & Update Status","type":"DETL","id":"n131","parentId":"n127","text":"Client uploads each chunk to S3, then PATCHes backend with chunk status and ETag; backend verifies and updates metadata."},{"children":[],"label":"Step 5: Complete Multipart Upload","type":"DETL","id":"n132","parentId":"n127","text":"Once all chunks are uploaded, backend calls S3's CompleteMultipartUpload API, then updates file metadata to 'uploaded'."},{"parentId":"n127","text":"Throughout the process, the client is responsible for tracking upload progress and updating the user interface.","id":"n133","type":"DETL","label":"Client UI Responsibility","children":[]}],"label":"Detailed Large File Upload Process","type":"CONC","id":"n127","parentId":"n106","text":"The comprehensive process for uploading a large file with chunking and fingerprinting involves multiple steps."},{"id":"n134","parentId":"n106","text":"Cloud storage providers like Amazon S3 offer a Multipart Upload feature that handles large objects in parts.","type":"CONC","children":[{"type":"INSG","parentId":"n134","text":"Candidates are expected to explain S3 Multipart Upload mechanics, not just state its use, to show understanding.","id":"n135","label":"Multipart Upload Interview Expectation","children":[]},{"parentId":"n134","text":"S3 event notifications only trigger when the entire multipart upload is completed, not for individual part uploads.","id":"n136","type":"DETL","label":"Multipart Upload Notifications","children":[]},{"type":"DETL","parentId":"n134","text":"To track individual part progress, S3's ListParts API can be used, which returns uploaded parts and their ETags.","id":"n137","label":"Tracking Individual Part Progress","children":[]}],"label":"S3 Multipart Upload Feature"},{"children":[{"parentId":"n138","text":"After assembly, downloads work like any normal file, using a single presigned or CDN signed URL.","id":"n139","type":"DETL","label":"Normal File Downloads","children":[]},{"type":"DETL","parentId":"n138","text":"For very large files, S3 and HTTP support Range requests, enabling parallel or resumable byte range downloads.","id":"n140","label":"HTTP Range Requests for Large Files","children":[]}],"label":"Chunked Downloads Not Needed","type":"CONC","id":"n138","parentId":"n106","text":"Chunked downloads are generally not needed as S3 assembles parts into a single object after multipart upload completion."}],"parentId":"n105","text":"Designing for large files requires addressing user experience and technical limitations of single requests.","id":"n106","type":"CONC"},{"type":"CONC","parentId":"n105","text":"Optimizing uploads, downloads, and syncing involves several techniques beyond basic approaches.","id":"n141","label":"Speed Optimization","children":[{"id":"n142","parentId":"n141","text":"CDNs cache files closer to the user, reducing latency and speeding up download times.","type":"DETL","children":[],"label":"Download Speed: CDN"},{"parentId":"n141","text":"Chunking maximizes bandwidth utilization by sending multiple chunks in parallel and adjusting sizes.","id":"n143","type":"DETL","label":"Upload Speed: Chunking","children":[]},{"label":"Sync Speed: Chunking","children":[],"type":"DETL","parentId":"n141","text":"For syncing, chunking allows only changed chunks to be transferred, significantly speeding up the process.","id":"n144"},{"id":"n145","parentId":"n141","text":"CDC uses rolling hashes to determine chunk boundaries based on content, making delta sync efficient for small edits.","type":"CONC","children":[{"children":[],"label":"Fixed-Size Chunking Issue","type":"JUST","id":"n146","parentId":"n145","text":"Fixed-size chunking makes delta sync useless because a small edit shifts all subsequent chunk boundaries."},{"parentId":"n145","text":"Systems like Dropbox use Rabin fingerprinting for CDC to achieve efficient delta sync.","id":"n147","type":"EXMP","label":"Rabin Fingerprinting for CDC","children":[]}],"label":"Content-Defined Chunking (CDC)"},{"id":"n148","parentId":"n141","text":"Compression reduces file size, meaning fewer bytes need to be transferred, speeding up uploads and downloads.","type":"CONC","children":[{"type":"JUST","id":"n149","parentId":"n148","text":"Compression happens on the client before uploading to S3, and decompression happens on the client after downloading.","children":[],"label":"Client-Side Compression"},{"label":"Smart Compression Logic","children":[],"type":"DETL","parentId":"n148","text":"Client-side logic should decide whether to compress based on file type, size, and network conditions.","id":"n150"},{"type":"EXMP","parentId":"n148","text":"Media files like images and videos have low compression ratios, making compression often not worthwhile.","id":"n151","label":"Media Files Compression","children":[]},{"children":[],"label":"Text Files Compression","id":"n152","parentId":"n148","text":"Text files can achieve high compression ratios, potentially reducing a 5GB file to 1GB or less.","type":"EXMP"},{"label":"Compression Algorithms","children":[{"id":"n154","parentId":"n153","text":"Gzip is widely used and broadly supported.","type":"DETL","children":[],"label":"Gzip Algorithm"},{"children":[],"label":"Brotli Algorithm","id":"n155","parentId":"n153","text":"Brotli generally offers better compression ratios than Gzip, especially for text, and is supported by modern browsers.","type":"DETL"},{"parentId":"n153","text":"Zstandard provides an excellent balance of speed and compression ratio, compressing and decompressing faster than Gzip.","id":"n156","type":"DETL","label":"Zstandard (zstd) Algorithm","children":[]},{"parentId":"n153","text":"Zstandard is a strong choice for client-side compression due to its fast compression speed.","id":"n157","type":"DCSN","label":"Zstandard for Client-Side Compression","children":[]}],"type":"CONC","parentId":"n148","text":"Common compression algorithms include Gzip, Brotli, and Zstandard, each with tradeoffs in ratio and speed.","id":"n153"},{"id":"n158","parentId":"n148","text":"Always compress files before encrypting them, as encryption introduces randomness that hinders compression.","type":"INSG","children":[],"label":"Compress Before Encrypting"}],"label":"Compression for Speed"}]},{"label":"File Security","children":[{"label":"Encryption in Transit (HTTPS)","children":[],"parentId":"n159","text":"Using HTTPS encrypts data transfer between client and server, a standard practice supported by modern browsers.","id":"n160","type":"DETL"},{"children":[],"label":"Encryption at Rest (S3)","type":"DETL","id":"n161","parentId":"n159","text":"Encrypting files stored in S3 is a native feature; S3 encrypts files with unique keys stored separately."},{"parentId":"n159","text":"The shareList or separate share table/cache serves as the basic Access Control List (ACL).","id":"n162","type":"CONC","label":"Access Control (ACL)","children":[{"children":[],"label":"Signed URLs for Secure Downloads","type":"DETL","id":"n163","parentId":"n162","text":"Download links are generated as signed URLs, valid only for a short period (e.g., 5 minutes)."},{"label":"Signed URLs as Bearer Tokens","children":[],"type":"INSG","parentId":"n162","text":"Signed URLs are bearer tokens, meaning anyone with a valid, unexpired URL can download the file.","id":"n164"},{"label":"Short Expiration for Security","children":[],"parentId":"n162","text":"A short expiration window limits exposure but does not fully prevent unauthorized sharing.","id":"n165","type":"JUST"},{"type":"DETL","id":"n166","parentId":"n162","text":"Signed URLs are generated on the server, incorporating a signature, expiration timestamp, and optional restrictions.","children":[],"label":"CDN Signed URL Generation"},{"children":[],"label":"CDN Signed URL Distribution","type":"DETL","id":"n167","parentId":"n162","text":"The signed URL is distributed to an authorized user to access the resource directly from the CDN."},{"type":"DETL","id":"n168","parentId":"n162","text":"CDN verifies the signature, expiration, and restrictions; serves content if valid, denies access otherwise.","children":[],"label":"CDN Signed URL Validation"}]}],"parentId":"n105","text":"Ensuring file security involves encryption in transit, encryption at rest, and robust access control.","id":"n159","type":"CONC"}]},{"parentId":"n1","text":"Expectations for system design interviews vary significantly based on candidate experience level (Mid-level, Senior, Staff+).","id":"n169","type":"CONC","label":"Interview Expectations by Level","children":[{"table":{"rows":[{"cells":["80% Breadth / 20% Depth","Drives early, interviewer probes basics and drives later stages","Defines API, data model, functional high-level design; reasons through probing questions."],"label":"Mid-level (E4)"},{"cells":["60% Breadth / 40% Depth","Proactive; anticipates challenges and suggests improvements","Quickly through high-level design; deep discussion on large files, multipart upload, trade-offs."],"label":"Senior (E5)"},{"cells":["40% Breadth / 60% Depth","Exceptional proactivity; identifies and solves issues independently, interviewer only focuses","Deep dive into nuances, practical application of technologies, confident solutions from experience, treats interviewer as peer."],"label":"Staff+ (E6+)"}],"cols":["Level","Breadth vs Depth","Driving/Proactivity","Dropbox Problem Bar"]},"label":"Candidate Expectations: Dropbox Problem","children":[],"parentId":"n169","text":"This comparison outlines the expected scope and depth of knowledge for Mid-level, Senior, and Staff+ candidates tackling the Dropbox system design problem.","id":"n170","type":"CMPR"}]}],"type":"APEX","text":"This document outlines the system design for a cloud-based file storage service like Dropbox, focusing on functional and non-functional requirements, core entities, API, and deep dives.","id":"n1"},"slug":"design-a-file-storage-service-like-dropb-787b3e","sharedAt":{"_seconds":1780506184,"_nanoseconds":116000000},"title":"Design a File Storage Service Like Dropbox"}