Clayton Schoeny

UCLA

  • About Me
  • Publications
  • Data Science Mini-Projects
  • Gallery
  • Download CV
  • Download Resume
  • LinkedIn Profile
  • Contact
image

Clayton Schoeny

I am a Product Data Scientist working at Cash App on the Cash Card team. I'm happy to be on a team that will directly further Square's mission of economic empowerment. I love my current job as it combines programming, statistics, machine learning, and business acumen to make excellent critical decisions.

I'm grateful for my previous role as a Senior Data Scientist at Fair, a FinTech/Automotive start-up in Santa Monica that will transform the idea of car ownership.

I'm a Los Angeles local, born and raised in Redondo Beach. Since childhood, I have had a passion for math, science, and technology. I received my BS, MS, and PhD degrees in Electrical Engineering from UCLA. I decided to continue my education in order to delve deeper into advanced mathematical concepts and to further the advancement of human knowledge through research. I have a strong math background and enjoy the statistical aspects of information theory and probability theory. My doctoral researched focused on utilizing both the inherent redundancy in data and the asymmetric errors in specific channels to store, analyze, and transmit data efficiently.

This website provides a snapshot of my background and capabilities, highlighting my education and industry experiences, publications, and personal mini-projects.

Industry Experience

  • Present 2020

    Cash App (Square)

    Product Data Scientist

    • Worked directly with both client and server engineers to establish and track metrics for new feature rollouts.

    • Created numerous ETLs to bring data into Snowflake for wider consumption.

    • Devised, analyzed, and executed policy proposals for a variety of challenging business decisions.

  • 2020 2018

    Fair

    Senior Data Scientist

    Created the following for a personalization microservice:

    • Cold Start -- predicts favorable makes, models, and body styles for our users.

    • Dynamic Search -- populates the order of makes, models, and body styles based on a users in-app behavior.

    • Survey Integration -- customizes the home screen and search results based on a user's survey responses.

    Data Scientist

    • CarRank algorithm design and implementation -- supervised learning techniques to rank the cars in the Fair app. Invented a novel tuning procedure for the ranking algorithm (patent pending).

    • Customer segmentation based on initial in-app behavior -- unsupervised machine learning techniques. Used for targeted re-engagement campaigns.

    • Vehicle Bucketing -- created a library to assign a vehicle to a particular filter/tag in the app depending on the quality of the vehicle and the specific customer.

  • 2015

    Space and Naval Warfare Systems Center Pacific (SPAWAR SSC-Pacific)

    Naval Research Enterprise Internship Program

    Command and Control

    • Developed new error-correcting codes used to synchronize Command and Control data in disconnected, intermittent, and low-bandwidth environments. The codes are the best known ECCs for bursty deletions/insertions.

  • 2013 2012

    DIRECTV

    Space, Communications, Video, and Design Thinking

    • Created reports detailing the probable outcomes of implementing a new bit-rate harvesting scheme.

    • Investigated the technical problems in incidences where audio loudness was inconsistent across programs and commercials.

    • Wrote a shell script that enabled us to easily test archived set top boxes for compatibility with new descriptive video services.

  • 2011 2010

    The Aerospace Corporation

    Communication Systems Engineering Department

    Performance and Analysis Division

    • Developed MATLAB scripts that simulate the performance of the Advanced Extremely High Frequency (AEHF) satellite in response to sophisticated frequency jammers.

    • Wrote C++ scripts (along with the Satellite Orbit Analysis Program) that analyze coverage of the shaped satellite beams on the Wideband Global SATCOM (WGS) system.

Education

  • Ph.D. 2014-2018

    Doctor of Philosophy - Electrical and Computer Engineering

    University of California, Los Angeles

    Research Lab: Laboratory for Robust Information Systems

    Dissertation: Coding for Future Large-Scale Data Systems

    GPA: 4.0

  • M.S.2012-2014

    Master of Science - Electrical Engineering

    University of California, Los Angeles

    Research Lab: Laboratory for Robust Information Systems

    Thesis: Efficient File Synchronization

    GPA: 3.7

  • B.S.2007-2012

    Bachelor of Science - Electrical Engineering

    Minor in Mathematics

    University of California, Los Angeles

    Latin Honors: Cum Laude

    Senior Design Project: Interactive Speech Recognition

Honors & Awards

  • 2018
    Distinguished PhD Dissertation in Signals & Systems Award
    Award given to the best PhD dissertation in the Signals & Systems track of the Electrical and Computer Engineering Department at UCLA.

    UCLA press release

  • 2018
    Best Paper Award: IEEE SELSE

    Conference: IEEE Workshop on Silicon Errors in Logic – System Effects

    Title: Parity++: Lightweight Error Correction for Last Level Caches

    Authors: Irina Alam, Clayton Schoeny, Lara Dolecek, Puneet Gupta

    Our paper is one of three papers to win the "Best of SELSE" award at 2018's gathering of computing reliability experts. The paper will re-appear in a special session and will be published in the proceedings of the prestigious IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) in Luxembourg City, Luxembourg, June 2018.

  • 2017
    Best Paper Award: ACM/IEEE Embedded Systems Week

    Conference: ACM/IEEE International Conference on Compilers, Architecture, and System Synthesis (CASES)

    Title: Low-Cost Memory Fault Tolerance for IoT Devices

    Authors: Mark Gottscho, Irina Alam, Clayton Schoeny, Lara Dolecek, Puneet Gupta

    Winner of the Best Paper Award at ACM/IEEE International Conference on Compilers, Architecture, and System Synthesis (CASES), at Embedded Systems Week in Seoul, Korea. Paper will appear in a special ES Week issue of ACM Transactions on Embedded Computing Systems.

    UCLA press release

    image

  • 2017
    2017-2018 UCLA Dissertation Year Fellowship Winner
    A campus wide fellowship to support doctoral students who are within one year of completing and filing their dissertation. Award winners are chosen based on the impact, innovation, and significance of the work.
  • 2016
    Best Paper Award: IEEE SELSE

    Conference: IEEE Workshop on Silicon Errors in Logic – System Effects

    Title: Software-Defined Error-Correcting Codes

    Authors: Mark Gottscho, Clayton Schoeny, Lara Dolecek, Puneet Gupta

    Our paper is one of three papers to win the "Best of SELSE" award at 2016's gathering of computing reliability experts. The paper be re-appeared in a special session and was be published in the proceedings of the prestigious IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) in Toulouse, France, June 2016.

  • 2016
    2016 Qualcomm Innovation Fellowship Winner

    Innovation Topic: Software-Defined Error-Correcting Codes

    Coauthor: Mark Gottscho

    One of eight winning proposals for the prestigious $100K/1 year Qualcomm Innovation Fellowship (QInF) in 2016. There were 129 proposals submitted by teams of two PhD students from only the top 18 ranked US universities (6.2% acceptance rate).

    Daily Bruin article

    EE news article

    image

  • 2015
    2014-2015 Henry Samueli Excellence in Teaching Award
    Best Teaching Assistant for EE Lecture Course: Spring 2015 EE132A Introduction to Communication Systems.

    image

  • 2015
    2015 Qualcomm Innovation Fellowship Finalist

    Innovation Topic: Coding Techniques for Next-Generation 3-D Flash Memories

    Coauthor: Frederic Sala

    There were 146 proposals submitted by teams of two PhD students from only the top 18 ranked US universities. Of these, 35 finalists were selected (acceptance rate 24.0%).

  • 2012
    Dean's Honors List

Selected Publications

Below is a list of my publications, sortable by type and date. Works that are currently in progress or in review are not included. Workshop papers are also not included but are listed in my CV. Thank you to all my coauthors; each of these projects has been a lot of fun to work on. The impact of my work can be assessed at my Google Scholar page.

Filter by type:

Sort by year:

Parity++: Lightweight Error Correction for Last Level Caches

Irina Alam, Clayton Schoeny, Lara Dolecek, Puneet Gupta
Conference Paper IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Luxembourg City, Luxembourg, Jun. 2018

Best Paper Award at The 14th Workshop on Silicon Errors in Logic - System Effects (SELSE), Re-appeared in Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)

Abstract

As the size of on-chip SRAM caches is increasing rapidly and the physical dimension of the SRAM devices is decreasing, reliability of caches is becoming a growing concern. This is because with increased size of caches, the likelihood of radiation-induced soft faults also increases. As a result, information redundancy in the form of Error Correcting Codes (ECC) is becoming extremely important, especially to protect the larger sized last level caches (LLCs). In typical ECCs, extra redundancy bits are added to every row to detect and correct errors. There is additional encoding (while writing data) and decoding (while reading data) procedures required as well. In caches, these additional area, power and latency overheads need to be minimized as much as possible. To address this problem, we present in this paper Parity++: a novel unequal message protection scheme for last level caches that preferentially provides stronger error protection to certain “special messages”. This protection scheme provides Single Error Detection (SED) for all messages and Single Error Correction (SEC) for a subset of messages. Thus, it is stronger than just a basic SED parity and has ∼9% lower storage overhead and much lower error detection energy than a traditional Single Error Correcting, Double Error Detecting (SECDED) code. We also propose a memory speculation procedure that can be used with any ECC scheme to hide the decoding latency while reading messages when there are no errors.

Context-Aware Resiliency: Unequal Message Protection for Random-Access Memories

Clayton Schoeny, Frederic Sala, Mark Gottscho, Irina Alam, Puneet Gupta, Lara Dolecek
Conference Paper IEEE Information Theory Workshop (ITW), Kaohsiung, Taiwan, Nov. 2017

Abstract

A common way to protect data stored in DRAM and related memory systems is through the use of a single-error-correcting/double-error-detecting (SECDED) code. Traditionally, these error-correcting codes provide equal protection guarantees to all messages. In a recent work, we demonstrated enhanced error correction capabilities for SECDED codes by taking into account contextual side-information about the data. This paper is concerned with a closely related scenario: \textit{unequal message protection} (UMP), where a subset of special messages is afforded additional error-correction ability. UMP is relevant to computing systems where certain messages are critical and failures cannot be tolerated. We study practical UMP constructions where messages are guaranteed either one or two bit-error-correction. We provide upper and lower bounds on the number of special messages. We introduce an explicit and practical code construction based on BCH subcodes and demonstrate the efficacy of our technique on data from the AxBench and SPEC CPU2006 benchmark suites.

Order-Optimal Permutation Codes in the Generalized Cayley Metric

Siyi Yang, Clayton Schoeny, Lara Dolecek
Conference Paper IEEE Information Theory Workshop (ITW), Kaohsiung, Taiwan, Nov. 2017

Abstract

Permutation codes have recently garnered substantial research interest. In this paper, we study the theoretical bounds and the construction of permutation codes in the generalized Cayley metric. The generalized Cayley metric captures the number of generalized transposition errors in a permutation, and subsumes existing error types including transpositions and translocations without imposing restrictions on the lengths and positions of the translocated segments. Relying on the breakpoint analysis proposed by Chee and Vu, we construct a new class of permutation codes. Our proposed scheme, although it is not explicitly constructive, is proved to have an order-optimal rate. Based on this method, we then come up with the coding scheme of another class of permutation codes that is not only explicitly constructive but also systematic, in the generalized Cayley metric. We prove the existence of order-optimal systematic codes based on this method and provide a construction of such a code. Finally we prove that our coding schemes have less redundancy than the existing codes based on interleaving when the codelength is sufficiently large and the number of errors is relatively small.

Low-Cost Memory Fault Tolerance for IoT Devices

Mark Gottscho, Irina Alam, Clayton Schoeny, Lara Dolecek, Puneet Gupta
Journal Paper ACM Transactions on Embedded Computing Systems (TECS) - Special Issue ESWEEK 2017

Winner of the Best Paper Award at the 2017 ACM/IEEE Embedded Systems Week Conference, Seoul, Korea, Oct. 2017

Abstract

IoT devices need reliable hardware at low cost. It is challenging to efficiently cope with both hard and soft faults in embedded scratchpad memories. To address this problem, we propose a two-step approach: FaultLink and Software-Defined Error-Localizing Codes (SDELC). FaultLink avoids hard faults found during testing by generating a custom-tailored application binary image for each individual chip. During software deployment-time, FaultLink optimally packs small sections of program code and data into fault-free segments of the memory address space and generates a custom linker script for a lazy-linking procedure. During run-time, SDELC deals with unpredictable soft faults via novel and inexpensive Ultra-Lightweight Error-Localizing Codes (UL-ELCs). These require fewer parity bits than single-error-correcting Hamming codes. Yet our UL-ELCs are more powerful than basic single-error-detecting parity: they localize single-bit errors to a specific chunk of a codeword. SDELC then heuristically recovers from these localized errors using a small embedded C library that exploits observable side information (SI) about the application’s memory contents. SI can be in the form of redundant data (value locality), legal/illegal instructions, etc. Our combined FaultLink+SDELC approach improves min-VDD by up to 440 mV and correctly recovers from up to 90% (70%) of random single-bit soft faults in data (instructions) with just three parity bits per 32-bit word.

Novel Combinatorial Coding Results for DNA Sequencing and Data Storage

Clayton Schoeny, Frederic Sala, Lara Dolecek
Conference Paper IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Nov. 2017

Abstract

Due to its incredible density and durability, DNA storage is a promising storage medium with the potential to handle the problems associated with the current exponential growth of data. However, DNA is an exotic storage technology, and as such, it can experience types of errors that typically do not occur in traditional storage devices such as, for example, insertions and deletions. We build upon a previous work by Schoeny et al. to construct an efficient non-binary burst-deletion correcting code. By setting the alphabet size to 4, this code is specifically suited for correcting burst-deletions in DNA storage.

A Coding Scheme for Reliable In-Memory Hamming Distance Computation

Zehui Chen, Clayton Schoeny, Yuval Cassuto, Lara Dolecek
Conference Paper IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Nov. 2017

Abstract

Computation-in-memory is a technique that has shown great potential in reducing the burden of massive data processing. Allowing for ultra-fast Hamming distance computations to be performed in-memory will drastically speed up many modern machine-learning algorithms. However, these in-memory calculations have not been studied in the presence of process variabilities. In this paper, we develop coding schemes to reliably compute, in-memory, the Hamming distances of pairs of vectors in the presence of write-time errors. Using an inversion coding technique, we establish error-detection guarantees as a function of the number of errors and the non-ideality of the resistive array memory in which the data is stored. To correct errors in the vector similarity comparison, we propose codes that achieve error correction and useful techniques for bit level data access and error localization. We demonstrate the effectiveness of our coding scheme on a simple example using the k-nearest neighbors algorithm.

Codes Correcting a Burst of Deletions or Insertions

Clayton Schoeny, Antonia Wachter-Zeh, Ryan Gabrys, Eitan Yaakobi
Journal Paper IEEE Transactions on Information Theory, vol. 63, no. 4, Jan. 2017

Abstract

This paper studies codes that correct a burst of deletions or insertions. Namely, a code will be called a b-burst-deletion/insertion-correcting code if it can correct a burst of deletions/insertions of any b consecutive bits. While the lower bound on the redundancy of such codes was shown by Levenshtein to be asymptotically log(n) + b - 1, the redundancy of the best code construction by Cheng et al. is b(log(n/b + 1)). In this paper, we close on this gap and provide codes with redundancy at most log(n) + (b - 1) log(log(n)) + b - log(b). We first show that the models of insertions and deletions are equivalent and thus it is enough to study codes correcting a burst of deletions. We then derive a non-asymptotic upper bound on the size of b-burst-deletion-correcting codes and extend the burst deletion model to two more cases: (1) a deletion burst of at most b consecutive bits and (2) a deletion burst of size at most b (not necessarily consecutive). We extend our code construction for the first case and study the second case for b = 3, 4.

(Partial results from this manuscript were presented at the 2016 ISIT conference.)

Exact Reconstruction From Insertions in Synchronization Codes

Frederic Sala, Ryan Gabrys Clayton Schoeny, Lara Dolecek
Journal Paper IEEE Transactions on Information Theory, vol. 63, no. 4, Jan. 2017

Abstract

This paper studies problems in data reconstruction, an important area with numerous applications. In particular, we examine the reconstruction of binary and nonbinary sequences from synchronization (insertion/deletion-correcting) codes. These sequences have been corrupted by a fixed number of symbol insertions (larger than the minimum edit distance of the code), yielding a number of distinct traces to be used for reconstruction. We wish to know the minimum number of traces needed for exact reconstruction. This is a general version of a problem tackled by Levenshtein for uncoded sequences. We introduce an exact formula for the maximum number of common supersequences shared by sequences at a certain edit distance, yielding an upper bound on the number of distinct traces necessary to guarantee exact reconstruction. Without specific knowledge of the code words, this upper bound is tight. We apply our results to the famous single deletion/insertion-correcting Varshamov-Tenengolts (VT) codes and show that a significant number of VT code word pairs achieve the worst case number of outputs needed for exact reconstruction. We also consider extensions to other channels, such as adversarial deletion and insertion/deletion channels and probabilistic channels.

(Partial results from this manuscript were presented at the 2015 and 2016 ISIT conferences.)

On Nonuniform Noisy Decoding for LDPC Codes With Application to Radiation-Induced Errors

Frederic Sala, Clayton Schoeny, Shahroze Kabir, Dariush Divsalar, Lara Dolecek
Journal Paper IEEE Transactions on Communications, vol. 65, no. 4, Jan. 2017

Abstract

Recent studies on noisy decoding for LDPC codes rely on the assumption that the noise in each component is independent and perpetual. This paper examines a noisy decoding model that generalizes this approach: the noise is due to multi-state channels, where the channel states are governed by queue-like processes. This model is inspired by errors in decoders that are due to the high levels of radiation. This is an important problem, as modern non-volatile memories (NVMs) must perform well in high-radiation environments if they are to be used for deep space applications. High levels of radiation have a significant impact on floating gate-based NVMs, such as flash, and therefore, require well-tuned, powerful error-correcting codes for reliable data storage along with the decoders capable of handling radiation-induced noisy components. We introduce a noisy LDPC decoding model subsuming certain previously studied models. This model is better suited to represent transient errors-in both variable nodes and check nodes-and allows for a more refined analysis compared with older, coarser models. We perform a density evolution-like theoretical evaluation, applicable to both regular and irregular codes, optimize the voting threshold for a Gallager B/E-decoder, and analyze the resulting evaluation. We also examine the finite block length case.

(Partial work from this manuscript was previously presented at the 2016 Asilomar conference.)

Flash Memories in High Radiation Environments: LDPC Decoder Study

Frederic Sala,Clayton Schoeny, Shahroze Kabit, Dariush Divsalar, Lara Dolecek
Conference Paper IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Nov. 2016

Abstract

Flash memory devices are increasingly being used in deep-space missions as on-board data storage in spacecraft. The harsh environment these missions take place in involves high levels of radiation which can cause decoding circuitry failures for the device error-correction module. For this reason, the decoder must be robust to radiation-induced errors. Previous works have studied hardware failure problem by modeling the circuitry noise as independent and constant in each component of the decoder. We propose a multi-state radiation channel which is designed to account for the dependence and duration of radiation-induced errors on different components. This model also subsumes some previously studied cases and allows for a more refined analysis. We introduce a corresponding LDPC combined Gallager B/E decoder and perform a density evolution analysis to characterize the idealized decoder performance. We optimize the decoder voting threshold parameter and apply our findings to a finite length case.

Approximate File Synchronization: Upper Bounds and Interactive Algorithms

Amirhossein Reisizadehmobarakeh, Clayton Schoeny, Chi-Yo Tsai, Lara Dolecek
Conference Paper IEEE Information Theory Workshop (ITW), Cambridge, UK, Sep. 2016

Abstract

File synchronization is a critical component of many modern data sharing applications. File synchronization is particularly challenging when different versions of a file that need to be synchronized differ in some number of edits. Several recent works have studied exact synchronization under edit errors. In this paper, we extend the available analytical toolbox of file synchronization by focusing on a previously unaddressed case of approximate synchronization wherein the reconstructed file need not be the same as the original file, but rather only be within some predetermined distortion. We study the case when a binary file undergoes symbol-level deletion errors with some small deletion rate (so that the total number of deletions is linear in file length). We derive a simple upper bound on the optimal rate of information that the transmitter (owner of the original file) needs to provide to the receiver (owner of the edited file) to allow the receiver to reconstruct the original file to within a predefined target distortion. We then create an approximate synchronization algorithm based on interactive communication between the transmitter and the receiver, and analyze the expected normalized communication bandwidth of our algorithm for various target distortion levels. Lastly, we implement our algorithm and provide experimental results on the approximate synchronization of a noisy image.

Codes Correcting a Burst of Deletions or Insertions

Clayton Schoeny, Antonia Wachter-Zeh, Ryan Gabrys, Eitan Yaakobi
Conference Paper IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, Jul. 2016

Abstract

This paper studies codes that correct bursts of deletions. Namely, a code will be called a b-burst-correcting code if it can correct a deletion of any b consecutive bits. While the lower bound on the redundancy of such codes was shown by Levenshtein to be asymptotically log(n) + b - 1, the redundancy of the best code construction by Cheng et al. is b(log(n/b + 1)). In this paper we close on this gap and provide codes with redundancy at most log(n) + (b - 1) log(log(n)) + b - log(b). We also extend the burst deletion model to two more cases: 1. a deletion burst of at most b consecutive bits and 2. a deletion burst of size at most b (not necessarily consecutive). We extend our code construction for the first case and study the second case for b = 3, 4. The equivalent models for insertions are also studied and are shown to be equivalent to correcting the corresponding burst of deletions.

Exact Sequence Reconstruction for Insertion-Correcting Codes

Fred Sala, Ryan Gabrys, Clayton Schoeny, Kayvon Mazooji, Lara Dolecek
Conference Paper IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, Jul. 2016

Abstract

We study the problem of perfectly reconstructing sequences from traces. The sequences are codewords from a deletion/insertion-correcting code and the traces are the result of corruption by a fixed number of symbol insertions (larger than the minimum edit distance of the code.) This is the general version of a problem tackled by Levenshtein for uncoded sequences. We introduce an exact formula for the maximum number of common supersequences shared by sequences at a certain edit distance, yielding a tight upper bound on the number of distinct traces necessary to guarantee exact reconstruction. We apply our results to the famous single deletion/insertion-correcting Varshamov-Tenengolts (VT) codes and show that a significant number of VT codeword pairs achieve the worst-case number of outputs needed for exact reconstruction.

The Weight Consistency Matrix Framework for General Non-Binary LDPC Code Optimization: Applications in Flash Memories

Ahmed Hareedy, Chinmayi Lanka, Clayton Schoeny, Lara Dolecek
Conference Paper IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, Jul. 2016

Abstract

Conventional error-correcting codes (ECCs) and system-level fault-tolerance mechanisms are currently treated as separate abstraction layers. This can reduce the overall efficacy of error detection and correction (EDAC) capabilities, impacting the reliability of memories by causing crashes or silent data corruption. To address this shortcoming, we propose Software-Defined ECC (SWD-ECC), a new class of heuristic techniques to recover from detected but uncorrectable errors (DUEs) in memory. It uses available side information to estimate the original message by first filtering and then ranking the possible candidate codewords for a DUE. SWD-ECC does not incur any hardware or software overheads in the cases where DUEs do not occur. As an exemplar for SWD-ECC, we show through offline analysis on SPEC CPU2006 benchmarks how to heuristically recover from 2-bit DUEs in MIPS instruction memory using a common (39,32) single-error-correcting, double-error-detecting (SECDED) code. We first apply coding theory to compute all of the candidate codewords for a given DUE. Second, we filter out the candidates that are not legal MIPS instructions, increasing the chance of successful recovery. Finally, we choose a valid candidate whose logical operation (e.g., add or load) occurs most frequently in the application binary image. Our results show that on average, 34% of all possible 2-bit DUEs in the evaluated set of instructions can be successfully recovered using this heuristic recovery strategy. If a DUE affects the bit fields used for instruction decoding, we are able to recover correctly up to 99% of the time. We believe these results to be a significant achievement compared to an otherwise-guaranteed crash which can be undesirable in many systems and applications. Moreover, there is room for future improvement of this result with more sophisticated uses of side information. We look forward to future work in this area.

Advanced Algebraic and Graph-Based ECC Schemes for Modern NVMs

Frederic Sala, Clayton Schoeny, Lara Dolecek
Book Chapter 3D Flash Memories, Rino Micheloni | Springer; 1st ed. | June 27, 2016 | ISBN-10: 9401775109

Abstract

Transmission channels underlying modern memory systems, e.g., Flash memories, possess a significant amount of asymmetry. While existing LDPC codes optimized for symmetric, AWGN-like channels are being actively considered for Flash applications, we demonstrate that, due to channel asymmetry, such approaches are fairly inadequate. We propose a new, general, combinatorial framework for the analysis and design of non-binary LDPC (NB-LDPC) codes for asymmetric channels. We introduce a refined definition of absorbing sets, which we call general absorbing sets (GASs), and an important subclass of GASs, which we refer to as general absorbing sets of type two (GASTs). Additionally, we study the combinatorial properties of GASTs. We then present the weight consistency matrix (WCM), which succinctly captures key properties in a GAST. Based on these new concepts, we then develop a general code optimization framework, and demonstrate its effectiveness on the realistic highly-asymmetric normal-Laplace mixture (NLM) Flash channel. Our optimized codes enjoy over one order (resp., half of an order) of magnitude performance gain in the uncorrectable BER (UBER) relative to the unoptimized codes (resp. the codes optimized for symmetric channels).

Software-Defined Error-Correcting Codes

Mark Gottscho, Clayton Schoeny, Lara Dolecek, Puneet Gupta
Conference Paper IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Toulouse, France, Jun. 2016

Best Paper Award at The 12th Workshop on Silicon Errors in Logic - System Effects (SELSE), Re-appeared in Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)

Abstract

Conventional error-correcting codes (ECCs) and system-level fault-tolerance mechanisms are currently treated as separate abstraction layers. This can reduce the overall efficacy of error detection and correction (EDAC) capabilities, impacting the reliability of memories by causing crashes or silent data corruption. To address this shortcoming, we propose Software-Defined ECC (SWD-ECC), a new class of heuristic techniques to recover from detected but uncorrectable errors (DUEs) in memory. It uses available side information to estimate the original message by first filtering and then ranking the possible candidate codewords for a DUE. SWD-ECC does not incur any hardware or software overheads in the cases where DUEs do not occur. As an exemplar for SWD-ECC, we show through offline analysis on SPEC CPU2006 benchmarks how to heuristically recover from 2-bit DUEs in MIPS instruction memory using a common (39,32) single-error-correcting, double-error-detecting (SECDED) code. We first apply coding theory to compute all of the candidate codewords for a given DUE. Second, we filter out the candidates that are not legal MIPS instructions, increasing the chance of successful recovery. Finally, we choose a valid candidate whose logical operation (e.g., add or load) occurs most frequently in the application binary image. Our results show that on average, 34% of all possible 2-bit DUEs in the evaluated set of instructions can be successfully recovered using this heuristic recovery strategy. If a DUE affects the bit fields used for instruction decoding, we are able to recover correctly up to 99% of the time. We believe these results to be a significant achievement compared to an otherwise-guaranteed crash which can be undesirable in many systems and applications. Moreover, there is room for future improvement of this result with more sophisticated uses of side information. We look forward to future work in this area.

Synchronizing Files From a Large Number of Insertions and Deletions

Frederic Sala, Clayton Schoeny, Nicolas Bitouze, Lara Dolecek
Journal Paper IEEE Transactions on Information Theory, vol. 64, no. 6, Apr. 2016

Abstract

Developing efficient algorithms to synchronize between different versions of files is an important problem with numerous applications. We consider the interactive synchronization protocol introduced by Yazdi and Dolecek, based on an earlier synchronization algorithm by Venkataramanan et al. Unlike preceding synchronization algorithms, Yazdi and Dolecek's algorithm is specifically designed to handle a number of deletions linear in the length of the file. We extend this algorithm in three ways. First, we handle nonbinary files. Second, these files contain symbols chosen according to nonuniform distributions. Finally, the files are modified by both insertions and deletions. We take into consideration the collision entropy of the source and refine the matching graph developed by Yazdi and Dolecek by appropriately placing weights on the matching graph edges. We compare our protocol with the widely used synchronization software rsync, and with the synchronization protocol by Venkataramanan et al. In addition, we provide tradeoffs between the number of rounds of communication and the total amount of bandwidth required to synchronize the two files under various implementation choices of the baseline algorithm. Finally, we show the robustness of the protocol under imperfect knowledge of the properties of the edit channel, which is the expected scenario in practice.

(This manuscript contains work previously presented at the 2013 ISIT, 2013 Allerton, and 2014 Asilomar conferences.)

Error Resilience and Energy Efficiency: An LDPC Decoder Design Study

Philip Schlafer, Chu-Hsiang Huang, Clayton Schoeny, Christian Weis, Yao Li, Norbert Wehn, Lara Dolecek
Conference Paper IEEE Design, Automation, and Test in Europe (DATE), Dresden, Germany, Mar. 2016

Abstract

Iterative decoding algorithms for low-density parity check (LDPC) codes have an inherent fault tolerance. In this paper, we exploit this robustness and optimize an LDPC decoder for high energy efficiency: we reduce energy consumption by opportunistically increasing error rates in decoder memories, while still achieving successful decoding in the final iteration. We develop a theory-guided unequal error protection (UEP) technique. UEP is implemented using dynamic voltage scaling that controls the error probability in the decoder memories on a per iteration basis. Specifically, via a density evolution analysis of an LDPC decoder, we first formulate the optimization problem of choosing an appropriate error rate for the decoder memories to achieve successful decoding under minimal energy consumption. We then propose a low complexity greedy algorithm to solve this optimization problem and map the resulting error rates to the corresponding supply voltage levels of the decoder memories in each iteration of the decoding algorithm. We demonstrate the effectiveness of our approach via ASIC synthesis results of a decoder for the LDPC code in the IEEE 802.11ad standard, implemented in 28nm FD-SOI technology. The proposed scheme achieves an increase in energy efficiency of up to 40% compared to the state-of-the-art solution.

Asymmetric ECCs for Flash in High-Radiation Environments

Frederic Sala,Clayton Schoeny, Dariush Divsalar, Lara Dolecek
Conference Paper IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Nov. 2015

Abstract

Research in coding for Flash devices typically deals with errors that occur during normal device operation. This work is instead concerned with codes that protect Flash devices that are experiencing errors caused by exposure to large radiation dosages. Such errors occur when, for example, Flash is used as onboard memory in satellites and space probes. In a previous paper, we examined the effects of errors on MLC Flash cells and introduced appropriate code constructions. In this work, we focus on the single level cell (SLC) case. We model the errors as being the result of a binary asymmetric channel (BASC). We show how to modify existing error-correcting codes in order to produce codes capable of correcting radiation-induced errors. Our approach focuses on finding and dealing with miscorrections, where an asymmetric decoder incorrectly deals with errors it is not designed to handle.

Analysis and Coding Schemes for the Flash Normal-Laplace Mixture Channel

Clayton Schoeny, Frederic Sala, Lara Dolecek
Conference Paper IEEE International Symposium on Information Theory (ISIT), Hong Kong, Jun. 2015

Abstract

Error-correcting codes are a critical need for modern flash memories. Such codes are typically designed under the assumption that the voltage threshold distributions in flash cells are Gaussian. This assumption, however, is not realistic. This is particularly the case late in the lifetime of flash devices. A recent work by Parnell et al. provides a parameterized model of MLC (2-bit cell) flash which accurately represents the voltage threshold distributions for an operating period up to 10 times longer than the device's specified lifetime. We analyze this model from an information-theoretic perspective and compute capacity for the resulting channel. We extrapolate the channel from an MLC to a TLC (3-bit cell) model and we characterize the resulting errors. We show that errors under the improved model are highly asymmetric. We introduce a code construction explicitly designed to exploit the asymmetric nature of these errors, and measure its improvement against existing codes at large P/E cycle counts.

Three Novel Combinatorial Theorems for the Insertion/Deletion Channel

Frederic Sala, Ryan Gabrys, Clayton Schoeny, Lara Dolecek
Conference Paper IEEE International Symposium on Information Theory (ISIT), Hong Kong, Jun. 2015

Abstract

Although the insertion/deletion problem has been studied for more than fifty years, many results still remain elusive. The goal of this work is to present three novel theorems with a combinatorial flavor that shed further light on the structure and nature of insertions/deletions. In particular, we give an exact result for the maximum number of common supersequences between two sequences, extending older work by Levenshtein. We then generalize this result for sequences that have different lengths. Finally, we compute the exact neighborhood size for the binary circular (alternating) string Cn = 0101 ... 01. In addition to furthering our understanding of the insertion/deletion channel, these theorems can be used as building blocks in other applications. One such application is developing improved lower bounds on the sizes of insertion/deletion-correcting codes.

Asymmetric Error-Correcting Codes for Flash Memories in High-Radiation Environments

Frederic Sala, Clayton Schoeny, Dariush Divsalar, Frederic Sala, Lara Dolecek
Conference Paper IEEE International Symposium on Information Theory (ISIT), Hong Kong, Jun. 2015

Abstract

Research works exploring coding for Flash memories typically seek to correct errors taking place during normal device operation. In this paper, we study the design of codes that protect Flash devices dealing with the unusual class of errors caused by exposure to large radiation dosages. Significant radiation exposure can take place, for example, when Flash is used as on-board memory in satellites and space probes. We introduce an error model that captures the effects of radiation exposure. Such errors are asymmetric, with the additional feature that the degree (and direction) of asymmetry depends on the stored sequence. We develop an appropriate distance and an upper bound on the sizes of codes which correct such errors. We introduce and analyze several simple code constructions.

Efficient File Synchronization: Extensions and Simulations

Clayton Schoeny, Nicolas Bitouze, Frederic Sala, Lara Dolecek
Conference Paper IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Nov. 2014

Abstract

We study the problem of synchronizing two files X and Y at two distant nodes A and B that are connected through a two-way communication channel. In our setup, Y is an edited version of X; edits are insertions and deletions that are potentially numerous. We previously proposed an order-wise optimal synchronization protocol for reconstructing file X at node B with an exponentially low probability of error. In this paper, we introduce an adaptive algebraic code to the synchronization protocol in order to increase efficiency in the presence of substitution errors. In addition, we expand on our previous results by presenting experimental results from several scenarios including different types of files and a variety of realistic error patterns.

Efficient File Synchronization

Clayton Schoeny
Thesis Master's Thesis, UCLA, Jun. 2014

Abstract

We study the synchronization of two files X and Y at two distant users A and B that are connected through a two-way communication channel. We previously proposed a synchronization protocol for reconstructing X at user B with exponentially low probability of error. We have proven the order-wise optimality of the protocol where the binary file Y is the original binary file X modified through i.i.d. insertion and deletion edits. In this thesis, we expand on previous results by presenting experimental results from numerous scenarios including different types of files and a variety of realistic error patterns. In addition, we introduce novel improvements to the synchronization protocol to further increase efficiency.

Data Science Mini-Projects

Listed below are links to a variety of my data science related mini-projects. These activities are completely independent from any of my previous/current industry or academic sources of funding or research ventures. In these fun, small, weekend projects, I explore and analyze various sources of data through a range of machine learning algorithms.

Here is my github repo containing all the projects.

Jupyter Notebooks

  • image

    UFC Fantasy Forecasting

    Kernel density estimation -- Object-oriented programming -- Monte Carlo simulation -- Linear programming -- Web scraping

  • image

    Daily Fantasy Golf Prediction

    Poisson random variables -- Monte Carlo simulation -- Linear programming -- Web scraping

  • image

    FIFA18 Dataset Exploration

    Nearest Neighbors -- Bootstrapping -- PCA

  • image

    Facebook Mutual Friends Analysis

    Graph Theory -- Communities -- NetworkX

  • image

    Book Genre Prediction

    TF-IDF -- Matrix Factorization -- Cosine Similarity

Gallery


I've had an amazing time throughout my studies thanks to friends, family, and the great locations I've been fortunate enough to visit. Here is a small snapshot of my adventures in recent years.

  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image
  • image

Contact

Feel free to reach out to me -- email is preferred.

  •    cschoeny@gmail.com
  •    linkedin.com/in/claytonschoeny